Sunday, May 22, 2016

Hdfs-site.xml configurations



Hdfs-site.xml



1)The number of past edits files to retain is controlled by the

dfs.namenode.num.extra.edits.retained



2.The number of fsimage checkpoint files to retain is controlled by the

dfs.namenode.num.checkpoints.retained.

3. NameNodes persist HDFS storage state information to disk. The value recorded in the

dfs.namenode.name.dir , dfs.namenode.edits.dir

4. dfs.namennode.safemode.threshold-pct4

Minimally replicated means at least one replica is available. This percentage is determined by the



5.failed disks tolerated by HDFS DataNodes?

dfs.datanode.failed.volumes.tolerated

6. The HDFS superuser account is determined by the dfs.cluster.administrators

7. dfs.webhdfs.enabled=”true” To verify that WebHDFS

8. If Kerberos is enabled, WebHDFS requires the configuration of two additional hdfs-site.xml

properties.The property names are

dfs.web.authentication.kerberos.principal=”HTTP:/$<FQDN>@$<REALM_NAME>.com”/” and

dfs.web.authentication.kerberos.keytab.=” /etc/security/spengo.service.keytab“

9. Only the dfs.namenode.acls.enabled property needs to be configured as true to set an ACL. NameNode rejects all attemps if this property is not enabled.

10. The mode parameter is calculated using the value of the fs.permissions.umask-mode property

. The default value is 022. For directories the value of 777- 022 = 755, for files the value 666 - 022 = 644 to produce the mode parameter.



11. The default data block size of 128 megabytes is determined by the dfs.blocksize property



12. dfs.replication property in hdfs-site.xml.



13. dfs.datanode.data.dir determines the parent directory used to store HDFS file data blocks. Could list /hadoop/hdfs/data1, /hadoop/hdfs/data2, and so on, which map to multiple disks.



14. dfs.bytes-per-checksum A checksum is calculated and stored on disk for each 512-byte chunk in a data block.



15. dfs.namenode.checkpoint.period Checkpoints occur every hour based on the value



16. If the number of transactions reaches the value in the

dfs.namenode.checkpoint.txns, then a checkpoint occurs immediately. The default is 1,000,000 transactions.



17. The heartbeat interval is three 3 seconds by default,

dfs.heartbeat.interval (datanode sends heart beats availability to namenode)



18. the DataNode is marked as stale  dfs.namenode.stale.datanode.interval. if exceeded value The minimum possible value is three times the heartbeat interval.(30-sceond threshold)



19. dfs.namenode.avoid.read.stale.datanode,when set to true in HDP by default. A stale DataNode is returned at the end of the list of DataNodes when the NameNode is trying to

satisfy client read requests.



20. dfs.namenode.avoid.write.stale.datanode is set to true in HDP by default. Avoids writing to stale datanode.



21.Stale DataNodes are written to only if the number of stale DataNodes exceeds the ratio determined by dfs.namenode.write.stale.datanode.ratio. In HDP it is set to 1, which means that HDFS may write to a stale DataNode.



22. A NameNode declares a DataNode dead when value exceeded 10 minutes 30 seconds

 2 x dfs.namenode.heartbeat.recheckinterval)+ (10 x dfs.heartbeat.interval).



dfs.namenode.heartbeat.recheckinterval default 10 minutes



dfs.heartbeat.interval = 3 seconds default value



23. each unread block for longer periods  has its checksum verified at least every two weeks. dfs.datanode.scan.period.hours value of 0 is disabled. A value 0f 560 for every two weeks



24. The address and port number of the NameNode UI is determined by the dfs.namenode.httpaddress or or dfs.namenode.https-address

Default port no is 50070



25. The most commonly edited file is hdfs-site.xml.

Others include core-site.xml, hadoop-policy.xml, hdfs-log4j, ssl-client.xml, sslserver.xml.



26.dfs.datanode.balance.bandwidthPerSec The default is set to 6,250,000 bytes per second. consider rebalancing 4-5 nodes at a time rather than balancing all 20 at once and also to preserve network bandwidth for running processes other than rebalancing



27. dfs.hosts.exclude =”/etc/hadoop/conf/dfs.exclude” file with the hostname of the DataNode when an administrator decommissions a DataNode.



28. When an administrator decommissions a NodeManager. This file is defined by the

yarn.resourcemanager.nodes.exclude-path =”/etc/hadoop/conf/yarn.exclude” property in the yarn-site.xml file.



29.For Namenode HA settings



---On current journal nodes on their installation paths in hdfs-site.xml set the property

Dfs.journalnode.edits.dir =”/path/to/edits/info/data” where editlogs are stored in the directory paths.



--Locating journal nodes will be set by property in hdfs-site.xml in

Dfs.namenode.shared.edits.dir “qjournal://jn1:8485;jn2:8485;j3:8485”



--dfs.nameservices =”haclustersetup”(The logical hdfs cluster name points to the two namenodes)



-- dfs.ha.namenode.haclustersetup=”nn1,nn2”(names of namenodes)



--dfs.namenode.http-address.<logical clustername>.<names of nodes>



Ex:dfs.namenode.http-address.<haclustersetup>.<nn1>= “node1:50070”

dfs.namenode.http-address.<haclustersetup>.<nn2>= “node2:50070”





--dfs.namenode.rpc-address.<logical clustername>.<name of node>

Ex:dfs.namenode.rpc-address.<haclustersetup>.<nn1>= “node1:8020”

dfs.namenode.rpc-address.<haclustersetup>.<nn2>= “node2:8020”





-- dfs.ha.fencing.methods(values: shell or sshfence)



-- dfs.client.failover.proxy.provider.mycluster property determines the Java class used by the client to determine which NameNode is currently the Active NameNode.”org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider





30.  dfs.blockreport.initialDelay =” 120 secondsAt DataNode startup, a block report is sent to the NameNode after a configurable delay.



31. dfs.blockreport.intervalMsec=” 21600000 milliseconds, or 6 hours. After initial startup, each DataNode periodically sends an updated block report to the NameNode



32. dfs.blockreport.split.threshold =” 1,000,000 blocks.below the threshold a single block report that includes every HDFS storage directory is sent to the NameNode. Threshold is exceeded block report spans multiple heartbeats

No comments:

Post a Comment