Showing posts with label yarn-site.xml configuration properties.. Show all posts
Showing posts with label yarn-site.xml configuration properties.. Show all posts

Sunday, May 22, 2016

Yarn-site.xml important configurations



yarn-site.xml: overrides default parameters contained in yarn-defaul.xml, which is
embedded in the hadoop-yarn-common-<version number>.jar file.
Contains most of the YARN-specific parameters for the ResourceManager, NodeManager, and Timeline Server components.
Settings not listed here will default to their yarn-default.xml values.

yarn-env.(sh/cmd) - .sh file for Linux installations, .cmd for Windows installations. Sets
YARN environmental variables and Java heap size configuration settings.

capacity-scheduler.xml – sets Capacity Scheduler parameters.

1)yarn.scheduler.minimum-allocation-mb: The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this won't take effect, and the specified value will get allocated at minimum.

2) yarn.scheduler.maximum-allocation-mb: The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.

3)yarn.nodemanager.resource.memory-mb: Amount of physical memory, in MB, that can be allocated for containers.

4)yarn.app.mapreduce.am.resource.mb: The amount of memory the MR  AppMaster needs .set in mapred-site.xml

5)yarn.app.mapreduce.am.command-opts; Java opts for the MR App Master processes. Set in mapred-site.xml




 



5b) YARN_RESOURCE_MANAGER=4096  set in yarn-env.sh
5c)yarn.Java.heapsize
6)yarn.acl.enable
7)yarn.admin.acl
8)yarn.log-aggregation-enable
Nodemanger
yarn.nodemanager.resource.cpu-vcores
Number of virtual CPU cores that can be allocated for containers.
8
yarn.nodemanager.resource.memory-mb
Amount of physical memory, in MB, that can be allocated for containers.
8 GB

9)yarn.nodemanager.vmem-pmem-ratio : Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio.
10) yarn.nodemaager.log-dirs Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container.
11) yarn.nodemanager.remote-app-log-dirs: Time in seconds to retain user logs. Only applicable if log aggregation is disabled
12) yarn.nodemanager.remote-app-log-dirs-suffix
13) yarn.nodemanager.aux-services : the valid service name should only contain a-zA-Z0-9_ and can not start with numbers
14) yarn.nodemanager.log-retain-second
Application Timeline Server: heap size, timeline services enabled, heartbeat intervals, timeline
service Webapp settings, and state directory.
${yarn.timeline-service.hostname}:10200
This is default address for the timeline server to start the RPC server.
${yarn.timeline-service.hostname}:8188
The http address of the timeline service web application.
${yarn.timeline-service.hostname}:8190
The https address of the timeline service web application.

15) AppTimelineServer.Java.heap.size
16)yarn.Timeline-service.enabled: Indicate to clients whether timeline service is enabled or not. If enabled, clients will put entities and events to the timeline server.
17) yarn.Timeline-service.leveldb-timeline-store.path: Store file name for leveldb timeline store.
18) yarn.Timeline-service.leveldb-timeline-store.ttl-interval-ms
19) yarn.Timeline-service.store-class: org.apache.hadoop.yarn.server.applicationhistoryservice.timeline.LeveldbTimelineStore: Store class name for timeline store.
20) yarn.Timeline-service.ttl-enable.: Enable age off of timeline store data.
21) yarn.Timeline-service.ttl-ms: Time to live for timeline store data in milliseconds.
Fault-Tolerance
22) yarn.nodemanager.recovery.enabled
23 )yarn.resourcemanager.recovery.enabled
24) yarn.resourcemanager.work-preserving-recovery.enabled
25) yarn.resourcemanager.zk-address
26) yarn.resourcemanager.connect.retry-interval.ms=30000” How often to try connecting to the ResourceManager.”
27) yarn.resourcemanager.connect.max-wait.ms=” 900000”Maximum time to wait to establish connection to ResourceManager.
28) yarn.resourcemanager.ha.enabled
ResourceManager Check Properties
1000
How often should the RM check that the AM is still alive.

NodeManagers send a heartbeat to the ResourceManager with the following property:
yarn.resourcemanager.nodemanagers.heartbeat-interval-ms = 1000
(1,000 milliseconds = 1 second)
This can be edited under Services > YARN > Configs > Advanced > Custom yarn-site (Add
Property …)


29)yarn.am.liveness-monitor.expiry-interval-ms = 600000
(600,000 milliseconds = 10 minutes)
Defines how long the ResourceManager waits to hear from an ApplicationManager before it is
considered dead.

yarn.resourcemanager.container.liveness-monitor.interval-ms = 600000
(600,000 milliseconds = 10 minutes)
Defines how long the ResourceManager waits to hear from a container before it is considered dead.

yarn.nm.liveness-monitor.interval-ms = 600000
(600,000 milliseconds = 10 minutes)Defines how long the ResourceManager waits to hear from a NodeManager before it is considered dead.

NodeManager Check Properties
yarn.nodemanager.container-monitor.interval-ms = 3000
(3,000 milliseconds = 3 seconds)
How often NodeManager checks on containers.

yarn.nodemanager.health-checker.interval-ms = 600000
(600,000 milliseconds = 10 minutes)
How often NodeManager runs health script

yarn.nodemanager.disk-health-checker.min-healthy-disks = 0.25
Sets the minimum percentage of healthy disks threshold to 25%.

yarn.nodemanager.disk-health-checker.max-disk-utilization-per-diskpercentage
= 90
Sets the maximum disk utilization per available disk threshold to be 90%.

yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb = 1000
Sets the minimum amount of free disk space per disk threshold to 1,000 MB.

All of the examples above can be edited under Services > YARN > Configs > Advanced >
Advanced yarn-site.


Work preserving restarts
yarn.resourcemanager.recovery.enabled = true

yarn.resourcemanager.workpreserving-recovery.enabled = true

yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resoucemanager.recovery.ZKRMStateStore

yarn.resourcemanager.am.max-attempts = 2

yarn.resourcemanager.zkaddress = <host>:2181

yarn.resourcemanager.zkstate-store.parent-path = /rmstore

yarn.resourcemanager.zknum-retries = 1000

yarn.resourcemanager.zkretry-interval-ms = 1000
yarn.resourcemanager.zktimeout-ms = 10000

yarn.resourcemanager.zk-acl world:anyone:rwcda

yarn.nodemanager.recovery.enabled = true

yarn.nodemanager.recovery.dir = /var/log/hadoop-yarn/nodemanager/recovery-
state

yarn.nodemanager.address 0.0.0.0:45454
log-aggregation:

Services > YARN> Configs > Advanced in the Ambari Web UI. All other settings are configurable under the Advanced yarn-site section.
NOTE
-1 = never.
yarn.log-aggregation-enable true

yarn.log-aggregation-retainseconds = 2592000 (30 days)

yarn.log.server.url= http://<HistoryServer>:19888/jobhistory/logs

yarn.nodemanager.logaggregation.compression-type =gz

yarn.nodemanager.logaggregation.debug-enabled= false

yarn.nodemanager.logaggregation.num-log-files-per-app=30

yarn.nodemanager.logaggregation.roll-monitoringinterval-seconds =-1

yarn.nodemanager.remote-app-logdir = /app-logs

yarn.nodemanager.remote-app-logdir-suffix = logs

decommission:

Ambari updates the /etc/hadoop/conf/yarn.exclude file with the hostname of the NodeManager
when an administrator decommissions a NodeManager. This file is defined by the
yarn.resourcemanager.nodes.exclude-path property in the yarn-site.xml file.
User mapping and group mapping in yarn queue manager
u:support01:Support,u:support02:Support,u:support13:Support,g:promo:Marketing,g:sales:Marketing,g:dev:Dev,g:qa:QA

u:%user:%user assigns all users to a queue that matches their user name
u:%user:%primary_group assigns all users to a queue that matches their group name

Resource Manager HA

yarn.resourcemanager.cluster-id property as the name yarn-cluster.

--yarn.resourcemanager.ha.rm-ids property contains a comma-separated list with the strings rm1 and rm2.


--Yarn.resourcemanager.ha.rm-ids = “rm1,rm2”(logical names to reach resource manager)

-Yarn.resourcemanager.webapp.address.rm1 =”node1:8088”

Yarn.resourcemanager.hostname.rm1 =”node1”

Yarn.resourcemanager.webapp.address.rm2 =”node2:8088”

Yarn.resourcemanager.hostname.rm2 =”node2”

Yarn.resourcemanager.zk-address=”zk1:2181,zk2:2181,zk3:2181”

yarn.client.failover-proxy-provider =”org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider” property determines the Java class used by these entities to determine which ResouceManager is currently the Active ResourceManager.


Yarn.resourcemanager.store.class =”org.apache.hadoop.yarn.server.resoucemanager.recovery.ZKRMStateStore” (java class used by ResourceManagers to work with ZooKeeper-based state store)

Yarn.resoucemanager.ha.automatic-failover.zk-base-path=”/yarn-leader-election”


The yarn.resourcemanager.zk-address property contains a comma-separated list of ZooKeeper
hostnames and port numbers. It is used by the ResourceManagers to connect to the ZooKeeper-based
state store.
The yarn.resourcemanager.store.class property determines the Java class that is used by the
ResourceManagers to connect with and use the ZooKeeper-based ZKRMStateStore.

YARN Work-Preserving Restarts
There are two types of workpreserving restarts: ResourceManager and NodeManager.


yarn.resourcemanager.recovery.enabled=true

yarn.resourcemanager.workpreserving-recovery.enabled =true

yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resoucemanager.recovery.ZKRMStateStore

yarn.resourcemanager.am.max-attempts=2

yarn.resourcemanager.zkaddress=<host>:2181

yarn.resourcemanager.zkstate-store.parent-path =/rmstore

yarn.resourcemanager.zknum-retries=1000

yarn.resourcemanager.zkretry-interval-ms=1000

yarn.resourcemanager.zktimeout-ms=10000

yarn.resourcemanager.zk-acl= world:anyone:rwcda

yarn.nodemanager.recovery.enabled=true

yarn.nodemanager.recovery.dir = /var/log/hadoop-yarn/nodemanager/recovery-state

yarn.nodemanager.address= 0.0.0.0:45454

Another value than can be tuned but is not usually necessary:
yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms = 10000
(10,000 milliseconds = 10 seconds)