Hdfs-site.xml
1)The number of past edits files to retain is controlled by the
dfs.namenode.num.extra.edits.retained
2.The number of fsimage checkpoint files to retain is
controlled by the
dfs.namenode.num.checkpoints.retained.
3. NameNodes persist HDFS storage state information to
disk. The value recorded in the
dfs.namenode.name.dir
, dfs.namenode.edits.dir
4. dfs.namennode.safemode.threshold-pct4
Minimally replicated means at least one replica is
available. This percentage is determined by the
5.failed disks tolerated by HDFS
DataNodes?
dfs.datanode.failed.volumes.tolerated
6. The HDFS superuser account is
determined by the dfs.cluster.administrators
7. dfs.webhdfs.enabled=”true” To verify that WebHDFS
8. If Kerberos is enabled, WebHDFS requires the
configuration of two additional hdfs-site.xml
properties.The property names are
dfs.web.authentication.kerberos.principal=”HTTP:/$<FQDN>@$<REALM_NAME>.com”/”
and
dfs.web.authentication.kerberos.keytab.=”
/etc/security/spengo.service.keytab“
9.
Only the dfs.namenode.acls.enabled
property needs to be configured as true to set an ACL. NameNode rejects all attemps if this
property is not enabled.
10. The mode parameter is calculated using the value of
the fs.permissions.umask-mode
property
. The default value is 022. For directories the value of
777- 022 = 755, for files the value 666 - 022 = 644 to produce the mode
parameter.
11. The default data block size of 128 megabytes is
determined by the dfs.blocksize
property
12. dfs.replication property in hdfs-site.xml.
13. dfs.datanode.data.dir determines the parent directory used
to store HDFS file data blocks. Could list /hadoop/hdfs/data1, /hadoop/hdfs/data2, and so on, which map to multiple
disks.
14. dfs.bytes-per-checksum A checksum is calculated and stored
on disk for each 512-byte chunk in a data block.
15. dfs.namenode.checkpoint.period Checkpoints occur every hour based
on the value
16. If the number of transactions reaches the value in
the
dfs.namenode.checkpoint.txns, then a checkpoint occurs
immediately. The default is 1,000,000 transactions.
17. The heartbeat interval is three 3 seconds by default,
dfs.heartbeat.interval
(datanode sends heart beats availability to namenode)
18. the DataNode is marked as stale dfs.namenode.stale.datanode.interval. if exceeded value The minimum
possible value is three times the heartbeat interval.(30-sceond threshold)
19. dfs.namenode.avoid.read.stale.datanode,when set to true in HDP by default. A stale DataNode
is returned at the end of the list of DataNodes when the NameNode is trying to
satisfy client read requests.
20. dfs.namenode.avoid.write.stale.datanode
is set to true in HDP by default. Avoids writing
to stale datanode.
21.Stale DataNodes are written to only if the number of
stale DataNodes exceeds the ratio determined by dfs.namenode.write.stale.datanode.ratio. In HDP it is set to 1, which means that HDFS may write to
a stale DataNode.
22.
A NameNode declares a DataNode dead when value exceeded 10 minutes 30 seconds
2 x dfs.namenode.heartbeat.recheckinterval)+
(10 x dfs.heartbeat.interval).
dfs.namenode.heartbeat.recheckinterval
default 10 minutes
dfs.heartbeat.interval
= 3 seconds default value
23. each unread block for longer
periods has its checksum verified at
least every two weeks. dfs.datanode.scan.period.hours value of 0 is
disabled. A value 0f 560 for every two weeks
24. The address and port number of the
NameNode UI is determined by the dfs.namenode.httpaddress or or dfs.namenode.https-address
Default port no is 50070
25. The most commonly edited file is hdfs-site.xml.
Others include core-site.xml, hadoop-policy.xml, hdfs-log4j,
ssl-client.xml, sslserver.xml.
26.dfs.datanode.balance.bandwidthPerSec
The default is set to 6,250,000
bytes per second. consider rebalancing 4-5 nodes at a time rather than
balancing all 20 at once and also to preserve network bandwidth for running
processes other than rebalancing
27. dfs.hosts.exclude =”/etc/hadoop/conf/dfs.exclude” file with the hostname of the
DataNode when an administrator decommissions a DataNode.
28. When an administrator decommissions a NodeManager.
This file is defined by the
yarn.resourcemanager.nodes.exclude-path
=”/etc/hadoop/conf/yarn.exclude” property in the yarn-site.xml
file.
29.For Namenode HA settings
---On current journal nodes on their
installation paths in hdfs-site.xml set the property
Dfs.journalnode.edits.dir
=”/path/to/edits/info/data”
where editlogs are stored in the directory paths.
--Locating journal nodes will be set
by property in hdfs-site.xml in
Dfs.namenode.shared.edits.dir
“qjournal://jn1:8485;jn2:8485;j3:8485”
--dfs.nameservices =”haclustersetup”(The logical
hdfs cluster name points to the two namenodes)
-- dfs.ha.namenode.haclustersetup=”nn1,nn2”(names
of namenodes)
--dfs.namenode.http-address.<logical
clustername>.<names of nodes>
Ex:dfs.namenode.http-address.<haclustersetup>.<nn1>=
“node1:50070”
dfs.namenode.http-address.<haclustersetup>.<nn2>=
“node2:50070”
--dfs.namenode.rpc-address.<logical
clustername>.<name of node>
Ex:dfs.namenode.rpc-address.<haclustersetup>.<nn1>=
“node1:8020”
dfs.namenode.rpc-address.<haclustersetup>.<nn2>=
“node2:8020”
-- dfs.ha.fencing.methods(values: shell or sshfence)
-- dfs.client.failover.proxy.provider.mycluster property determines the Java class
used by the client to determine which NameNode is currently the Active
NameNode.”org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
30. dfs.blockreport.initialDelay
=” 120 seconds”At DataNode startup, a block report
is sent to the NameNode after a configurable delay.
31. dfs.blockreport.intervalMsec=” 21600000 milliseconds, or 6 hours.” After initial startup, each
DataNode periodically sends an updated block report to the NameNode
32. dfs.blockreport.split.threshold =” 1,000,000 blocks.”
below the threshold a single block
report that includes every HDFS storage directory is sent to the NameNode.
Threshold is exceeded block report spans multiple heartbeats