yarn-site.xml: overrides default parameters
contained in yarn-defaul.xml, which is
embedded in the hadoop-yarn-common-<version
number>.jar file.
Contains most of the YARN-specific parameters for the
ResourceManager, NodeManager, and Timeline Server components.
Settings not listed here will default to their yarn-default.xml
values.
yarn-env.(sh/cmd)
- .sh file for Linux installations,
.cmd for Windows installations. Sets
YARN environmental variables and Java heap size
configuration settings.
capacity-scheduler.xml
– sets Capacity Scheduler
parameters.
1)yarn.scheduler.minimum-allocation-mb:
The minimum allocation for every container request at the RM, in MBs. Memory
requests lower than this won't take effect, and the specified value will get allocated
at minimum.
2) yarn.scheduler.maximum-allocation-mb:
The maximum allocation for every container request at the RM, in MBs. Memory
requests higher than this won't take effect, and will get capped to this value.
3)yarn.nodemanager.resource.memory-mb:
Amount of physical memory, in MB, that can be allocated for containers.
4)yarn.app.mapreduce.am.resource.mb:
The amount of memory the MR AppMaster
needs .set in mapred-site.xml
5)yarn.app.mapreduce.am.command-opts;
Java opts for the MR App Master processes. Set in mapred-site.xml
5b)
YARN_RESOURCE_MANAGER=4096 set in
yarn-env.sh
5c)yarn.Java.heapsize
6)yarn.acl.enable
7)yarn.admin.acl
8)yarn.log-aggregation-enable
Nodemanger
yarn.nodemanager.resource.cpu-vcores
|
Number of virtual CPU cores that
can be allocated for containers.
|
8
|
yarn.nodemanager.resource.memory-mb
|
Amount of physical memory, in MB,
that can be allocated for containers.
|
8 GB
|
9)yarn.nodemanager.vmem-pmem-ratio : Ratio between virtual
memory to physical memory when setting memory limits for containers. Container
allocations are expressed in terms of physical memory, and virtual memory usage
is allowed to exceed this allocation by this ratio.
10) yarn.nodemaager.log-dirs Where to store container logs.
An application's localized log directory will be found in
${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log
directories will be below this, in directories named container_{$contid}. Each
container directory will contain the files stderr, stdin, and syslog generated
by that container.
11) yarn.nodemanager.remote-app-log-dirs: Time in seconds to
retain user logs. Only applicable if log aggregation is disabled
12) yarn.nodemanager.remote-app-log-dirs-suffix
13) yarn.nodemanager.aux-services : the valid service name
should only contain a-zA-Z0-9_ and can not start with numbers
14) yarn.nodemanager.log-retain-second
Application Timeline Server: heap size, timeline services
enabled, heartbeat intervals, timeline
service
Webapp settings, and state directory.
${yarn.timeline-service.hostname}:10200
|
This is default address for the
timeline server to start the RPC server.
|
|
${yarn.timeline-service.hostname}:8188
|
The http address of the timeline
service web application.
|
|
${yarn.timeline-service.hostname}:8190
|
The https address of the timeline
service web application.
|
15)
AppTimelineServer.Java.heap.size
16)yarn.Timeline-service.enabled: Indicate to clients
whether timeline service is enabled or not. If enabled, clients will put
entities and events to the timeline server.
17) yarn.Timeline-service.leveldb-timeline-store.path: Store
file name for leveldb timeline store.
18) yarn.Timeline-service.leveldb-timeline-store.ttl-interval-ms
19) yarn.Timeline-service.store-class: org.apache.hadoop.yarn.server.applicationhistoryservice.timeline.LeveldbTimelineStore:
Store class name for timeline store.
20) yarn.Timeline-service.ttl-enable.: Enable age off of
timeline store data.
21) yarn.Timeline-service.ttl-ms: Time to live for timeline
store data in milliseconds.
Fault-Tolerance
22) yarn.nodemanager.recovery.enabled
23 )yarn.resourcemanager.recovery.enabled
24) yarn.resourcemanager.work-preserving-recovery.enabled
25) yarn.resourcemanager.zk-address
26) yarn.resourcemanager.connect.retry-interval.ms=30000” How
often to try connecting to the ResourceManager.”
27) yarn.resourcemanager.connect.max-wait.ms=” 900000”Maximum
time to wait to establish connection to ResourceManager.
28) yarn.resourcemanager.ha.enabled
ResourceManager Check Properties
1000
|
How often should the RM check that
the AM is still alive.
|
NodeManagers send a heartbeat to the ResourceManager with
the following property:
yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
= 1000
(1,000 milliseconds = 1 second)
This can be edited under Services > YARN > Configs
> Advanced > Custom yarn-site (Add
Property
…)
29) yarn.am.liveness-monitor.expiry-interval-ms
= 600000
(600,000 milliseconds = 10 minutes)
Defines how long the ResourceManager waits to hear from
an ApplicationManager before it is
considered dead.
yarn.resourcemanager.container.liveness-monitor.interval-ms
= 600000
(600,000 milliseconds = 10 minutes)
Defines how long the ResourceManager waits to hear from a
container before it is considered dead.
yarn.nm.liveness-monitor.interval-ms
= 600000
(600,000
milliseconds = 10 minutes)Defines how long the ResourceManager waits to hear
from a NodeManager before it is considered dead.
NodeManager Check Properties
yarn.nodemanager.container-monitor.interval-ms
= 3000
(3,000 milliseconds = 3 seconds)
How often NodeManager checks on containers.
yarn.nodemanager.health-checker.interval-ms
= 600000
(600,000 milliseconds = 10 minutes)
How often NodeManager runs health script
yarn.nodemanager.disk-health-checker.min-healthy-disks
= 0.25
Sets the minimum percentage of healthy disks threshold to
25%.
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-diskpercentage
= 90
Sets the maximum disk utilization per available disk
threshold to be 90%.
yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
= 1000
Sets the minimum amount of free disk space per disk
threshold to 1,000 MB.
All of the examples above can be edited under Services
> YARN > Configs > Advanced >
Advanced yarn-site.
Work preserving restarts
yarn.resourcemanager.recovery.enabled
= true
yarn.resourcemanager.workpreserving-recovery.enabled
= true
yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resoucemanager.recovery.ZKRMStateStore
yarn.resourcemanager.am.max-attempts
= 2
yarn.resourcemanager.zkaddress
= <host>:2181
yarn.resourcemanager.zkstate-store.parent-path
= /rmstore
yarn.resourcemanager.zknum-retries
= 1000
yarn.resourcemanager.zkretry-interval-ms
= 1000
yarn.resourcemanager.zktimeout-ms
= 10000
yarn.resourcemanager.zk-acl
world:anyone:rwcda
yarn.nodemanager.recovery.enabled
= true
yarn.nodemanager.recovery.dir
= /var/log/hadoop-yarn/nodemanager/recovery-
state
yarn.nodemanager.address
0.0.0.0:45454
log-aggregation:
Services > YARN> Configs > Advanced in the
Ambari Web UI. All other settings are configurable under the Advanced yarn-site
section.
NOTE
-1 = never.
yarn.log-aggregation-enable
true
yarn.log-aggregation-retainseconds
= 2592000 (30 days)
yarn.log.server.url=
http://<HistoryServer>:19888/jobhistory/logs
yarn.nodemanager.logaggregation.compression-type
=gz
yarn.nodemanager.logaggregation.debug-enabled=
false
yarn.nodemanager.logaggregation.num-log-files-per-app=30
yarn.nodemanager.logaggregation.roll-monitoringinterval-seconds
=-1
yarn.nodemanager.remote-app-logdir
= /app-logs
yarn.nodemanager.remote-app-logdir-suffix
= logs
decommission:
Ambari updates the /etc/hadoop/conf/yarn.exclude
file with the hostname of the
NodeManager
when an administrator decommissions
a NodeManager. This file is defined by the
yarn.resourcemanager.nodes.exclude-path property
in the yarn-site.xml file.
User mapping and group mapping in yarn queue manager
u:support01:Support,u:support02:Support,u:support13:Support,g:promo:Marketing,g:sales:Marketing,g:dev:Dev,g:qa:QA
u:%user:%user
assigns all users to a queue that
matches their user name
u:%user:%primary_group
assigns all users to a queue that
matches their group name
Resource Manager HA
yarn.resourcemanager.cluster-id
property as the name yarn-cluster.
--yarn.resourcemanager.ha.rm-ids property contains a comma-separated
list with the strings rm1 and
rm2.
--Yarn.resourcemanager.ha.rm-ids =
“rm1,rm2”(logical names to reach resource manager)
-Yarn.resourcemanager.webapp.address.rm1
=”node1:8088”
Yarn.resourcemanager.hostname.rm1 =”node1”
Yarn.resourcemanager.webapp.address.rm2 =”node2:8088”
Yarn.resourcemanager.hostname.rm2 =”node2”
Yarn.resourcemanager.zk-address=”zk1:2181,zk2:2181,zk3:2181”
yarn.client.failover-proxy-provider
=”org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider” property determines the Java class
used by these entities to determine which ResouceManager is currently the
Active ResourceManager.
Yarn.resourcemanager.store.class
=”org.apache.hadoop.yarn.server.resoucemanager.recovery.ZKRMStateStore” (java
class used by ResourceManagers to work with ZooKeeper-based state store)
Yarn.resoucemanager.ha.automatic-failover.zk-base-path=”/yarn-leader-election”
The yarn.resourcemanager.zk-address
property contains a comma-separated
list of ZooKeeper
hostnames and port numbers. It is used by the
ResourceManagers to connect to the ZooKeeper-based
state store.
The yarn.resourcemanager.store.class
property determines the Java class
that is used by the
ResourceManagers to connect with and use the
ZooKeeper-based ZKRMStateStore.
YARN Work-Preserving Restarts
There are two types of workpreserving restarts:
ResourceManager and NodeManager.
yarn.resourcemanager.recovery.enabled=true
yarn.resourcemanager.workpreserving-recovery.enabled
=true
yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resoucemanager.recovery.ZKRMStateStore
yarn.resourcemanager.am.max-attempts=2
yarn.resourcemanager.zkaddress=<host>:2181
yarn.resourcemanager.zkstate-store.parent-path
=/rmstore
yarn.resourcemanager.zknum-retries=1000
yarn.resourcemanager.zkretry-interval-ms=1000
yarn.resourcemanager.zktimeout-ms=10000
yarn.resourcemanager.zk-acl=
world:anyone:rwcda
yarn.nodemanager.recovery.enabled=true
yarn.nodemanager.recovery.dir
= /var/log/hadoop-yarn/nodemanager/recovery-state
yarn.nodemanager.address=
0.0.0.0:45454
Another value than can be tuned but is not usually
necessary:
yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms
= 10000
(10,000 milliseconds = 10 seconds)
No comments:
Post a Comment