hadooop ambari administration: Yarn-site.xml important configurations

yarn-site.xml: overrides default parameters contained in yarn-defaul.xml, which is

embedded in the hadoop-yarn-common-<version number>.jar file.

Contains most of the YARN-specific parameters for the ResourceManager, NodeManager, and Timeline Server components.

Settings not listed here will default to their yarn-default.xml values.

• yarn-env.(sh/cmd) - .sh file for Linux installations, .cmd for Windows installations. Sets

YARN environmental variables and Java heap size configuration settings.

• capacity-scheduler.xml – sets Capacity Scheduler parameters.

1)yarn.scheduler.minimum-allocation-mb: The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this won't take effect, and the specified value will get allocated at minimum.

2) yarn.scheduler.maximum-allocation-mb: The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.

3)yarn.nodemanager.resource.memory-mb: Amount of physical memory, in MB, that can be allocated for containers.

4)yarn.app.mapreduce.am.resource.mb: The amount of memory the MR AppMaster needs .set in mapred-site.xml

5)yarn.app.mapreduce.am.command-opts; Java opts for the MR App Master processes. Set in mapred-site.xml

5b) YARN_RESOURCE_MANAGER=4096 set in yarn-env.sh

5c)yarn.Java.heapsize

6)yarn.acl.enable

7)yarn.admin.acl

8)yarn.log-aggregation-enable

Nodemanger

yarn.nodemanager.resource.cpu-vcores	Number of virtual CPU cores that can be allocated for containers.	8
yarn.nodemanager.resource.memory-mb	Amount of physical memory, in MB, that can be allocated for containers.	8 GB

9)yarn.nodemanager.vmem-pmem-ratio : Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio.

10) yarn.nodemaager.log-dirs Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container.

11) yarn.nodemanager.remote-app-log-dirs: Time in seconds to retain user logs. Only applicable if log aggregation is disabled

12) yarn.nodemanager.remote-app-log-dirs-suffix

13) yarn.nodemanager.aux-services : the valid service name should only contain a-zA-Z0-9_ and can not start with numbers

14) yarn.nodemanager.log-retain-second

Application Timeline Server: heap size, timeline services enabled, heartbeat intervals, timeline

service Webapp settings, and state directory.

yarn.timeline-service.address	${yarn.timeline-service.hostname}:10200	This is default address for the timeline server to start the RPC server.
yarn.timeline-service.webapp.address	${yarn.timeline-service.hostname}:8188	The http address of the timeline service web application.
yarn.timeline-service.webapp.https.address	${yarn.timeline-service.hostname}:8190	The https address of the timeline service web application.

15) AppTimelineServer.Java.heap.size

16)yarn.Timeline-service.enabled: Indicate to clients whether timeline service is enabled or not. If enabled, clients will put entities and events to the timeline server.

17) yarn.Timeline-service.leveldb-timeline-store.path: Store file name for leveldb timeline store.

18) yarn.Timeline-service.leveldb-timeline-store.ttl-interval-ms

19) yarn.Timeline-service.store-class: org.apache.hadoop.yarn.server.applicationhistoryservice.timeline.LeveldbTimelineStore: Store class name for timeline store.

20) yarn.Timeline-service.ttl-enable.: Enable age off of timeline store data.

21) yarn.Timeline-service.ttl-ms: Time to live for timeline store data in milliseconds.

Fault-Tolerance

22) yarn.nodemanager.recovery.enabled

23 )yarn.resourcemanager.recovery.enabled

24) yarn.resourcemanager.work-preserving-recovery.enabled

25) yarn.resourcemanager.zk-address

26) yarn.resourcemanager.connect.retry-interval.ms=30000” How often to try connecting to the ResourceManager.”

27) yarn.resourcemanager.connect.max-wait.ms=” 900000”Maximum time to wait to establish connection to ResourceManager.

28) yarn.resourcemanager.ha.enabled

ResourceManager Check Properties

yarn.resourcemanager.amliveliness-monitor.interval-ms

1000

How often should the RM check that the AM is still alive.

NodeManagers send a heartbeat to the ResourceManager with the following property:

• yarn.resourcemanager.nodemanagers.heartbeat-interval-ms = 1000

(1,000 milliseconds = 1 second)

This can be edited under Services > YARN > Configs > Advanced > Custom yarn-site (Add

Property …)

29) • yarn.am.liveness-monitor.expiry-interval-ms = 600000

(600,000 milliseconds = 10 minutes)

Defines how long the ResourceManager waits to hear from an ApplicationManager before it is

considered dead.

• yarn.resourcemanager.container.liveness-monitor.interval-ms = 600000

(600,000 milliseconds = 10 minutes)

Defines how long the ResourceManager waits to hear from a container before it is considered dead.

• yarn.nm.liveness-monitor.interval-ms = 600000

(600,000 milliseconds = 10 minutes)Defines how long the ResourceManager waits to hear from a NodeManager before it is considered dead.

NodeManager Check Properties

• yarn.nodemanager.container-monitor.interval-ms = 3000

(3,000 milliseconds = 3 seconds)

How often NodeManager checks on containers.

• yarn.nodemanager.health-checker.interval-ms = 600000

(600,000 milliseconds = 10 minutes)

How often NodeManager runs health script

• yarn.nodemanager.disk-health-checker.min-healthy-disks = 0.25

Sets the minimum percentage of healthy disks threshold to 25%.

• yarn.nodemanager.disk-health-checker.max-disk-utilization-per-diskpercentage

= 90

Sets the maximum disk utilization per available disk threshold to be 90%.

• yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb = 1000

Sets the minimum amount of free disk space per disk threshold to 1,000 MB.

All of the examples above can be edited under Services > YARN > Configs > Advanced >

Advanced yarn-site.

Work preserving restarts

yarn.resourcemanager.recovery.enabled = true

yarn.resourcemanager.workpreserving-recovery.enabled = true

yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resoucemanager.recovery.ZKRMStateStore

yarn.resourcemanager.am.max-attempts = 2

yarn.resourcemanager.zkaddress = <host>:2181

yarn.resourcemanager.zkstate-store.parent-path = /rmstore

yarn.resourcemanager.zknum-retries = 1000

yarn.resourcemanager.zkretry-interval-ms = 1000

yarn.resourcemanager.zktimeout-ms = 10000

yarn.resourcemanager.zk-acl world:anyone:rwcda

yarn.nodemanager.recovery.enabled = true

yarn.nodemanager.recovery.dir = /var/log/hadoop-yarn/nodemanager/recovery-

state

yarn.nodemanager.address 0.0.0.0:45454

log-aggregation:

Services > YARN> Configs > Advanced in the Ambari Web UI. All other settings are configurable under the Advanced yarn-site section.

NOTE

-1 = never.

yarn.log-aggregation-enable true

yarn.log-aggregation-retainseconds = 2592000 (30 days)

yarn.log.server.url= http://<HistoryServer>:19888/jobhistory/logs

yarn.nodemanager.logaggregation.compression-type =gz

yarn.nodemanager.logaggregation.debug-enabled= false

yarn.nodemanager.logaggregation.num-log-files-per-app=30

yarn.nodemanager.logaggregation.roll-monitoringinterval-seconds =-1

yarn.nodemanager.remote-app-logdir = /app-logs

yarn.nodemanager.remote-app-logdir-suffix = logs

decommission:

Ambari updates the /etc/hadoop/conf/yarn.exclude file with the hostname of the NodeManager

when an administrator decommissions a NodeManager. This file is defined by the

yarn.resourcemanager.nodes.exclude-path property in the yarn-site.xml file.

User mapping and group mapping in yarn queue manager

u:support01:Support,u:support02:Support,u:support13:Support,g:promo:Marketing,g:sales:Marketing,g:dev:Dev,g:qa:QA

u:%user:%user assigns all users to a queue that matches their user name

• u:%user:%primary_group assigns all users to a queue that matches their group name

Resource Manager HA

yarn.resourcemanager.cluster-id property as the name yarn-cluster.

--yarn.resourcemanager.ha.rm-ids property contains a comma-separated list with the strings rm1 and rm2.

--Yarn.resourcemanager.ha.rm-ids = “rm1,rm2”(logical names to reach resource manager)

-Yarn.resourcemanager.webapp.address.rm1 =”node1:8088”

Yarn.resourcemanager.hostname.rm1 =”node1”

Yarn.resourcemanager.webapp.address.rm2 =”node2:8088”

Yarn.resourcemanager.hostname.rm2 =”node2”

Yarn.resourcemanager.zk-address=”zk1:2181,zk2:2181,zk3:2181”

yarn.client.failover-proxy-provider =”org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider” property determines the Java class used by these entities to determine which ResouceManager is currently the Active ResourceManager.

Yarn.resourcemanager.store.class =”org.apache.hadoop.yarn.server.resoucemanager.recovery.ZKRMStateStore” (java class used by ResourceManagers to work with ZooKeeper-based state store)

Yarn.resoucemanager.ha.automatic-failover.zk-base-path=”/yarn-leader-election”

The yarn.resourcemanager.zk-address property contains a comma-separated list of ZooKeeper

hostnames and port numbers. It is used by the ResourceManagers to connect to the ZooKeeper-based

state store.

The yarn.resourcemanager.store.class property determines the Java class that is used by the

ResourceManagers to connect with and use the ZooKeeper-based ZKRMStateStore.

YARN Work-Preserving Restarts

There are two types of workpreserving restarts: ResourceManager and NodeManager.

yarn.resourcemanager.recovery.enabled=true

yarn.resourcemanager.workpreserving-recovery.enabled =true

yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resoucemanager.recovery.ZKRMStateStore

yarn.resourcemanager.am.max-attempts=2

yarn.resourcemanager.zkaddress=<host>:2181

yarn.resourcemanager.zkstate-store.parent-path =/rmstore

yarn.resourcemanager.zknum-retries=1000

yarn.resourcemanager.zkretry-interval-ms=1000

yarn.resourcemanager.zktimeout-ms=10000

yarn.resourcemanager.zk-acl= world:anyone:rwcda

yarn.nodemanager.recovery.enabled=true

yarn.nodemanager.recovery.dir = /var/log/hadoop-yarn/nodemanager/recovery-state

yarn.nodemanager.address= 0.0.0.0:45454

Another value than can be tuned but is not usually necessary:

• yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms = 10000

(10,000 milliseconds = 10 seconds)

hadooop ambari administration

Sunday, May 22, 2016

Yarn-site.xml important configurations

No comments:

Post a Comment