hadooop ambari administration

hdfs dfs –ls file:///bin

hdfs dfs –ls hdfs:///root

file://, swift://, or s3://

hdfs dfs –help

hdfs dfs –mkdir /user/steve/dir1

List the contents of a directory recursively:

hdfs dfs –ls –R /user/steve

hdfs dfs –put

hdfs dfs –appendToFile

hdfs dfs –cat fileA

View only the last 1 KB of a file:

hdfs dfs –tail fileB

hdfs dfs –get /user/steve/fileB /home/steve/fileB

hdfs dfs –mv /user/steve/fileC /user/steve/dir1/fileC

hdfs dfs –getmerge fileF fileG

hdfs dfs –rm fileB fileC

hdfs dfs –rmdir /user/steve/dir1/dir2

Trash is configured by two properties, fs.trash.checkpoint.interval in core-default.xml and fs.trash.interval in core-site.xml.

hdfs dfs –D dfs.blocksize=<N> -D dfs.replication=5 –put

hdfs dfs –chown danielle /data/weblogs/fileA

hdfs dfs –chgrp hdfs fileB

Change owner and group membership simultaneously:

hdfs dfs –chown hcat:hdfs /data/weblogs/fileC

The command requires HDFS superuser privileges.

Using setfacl on Files

• Set (remove existing and replace) both permissions and ACL entries using a single command:

$ hdfs dfs –setfacl --set user::rw-,group::r--,other::---,user:steve:rw-

,user:jason:rw- fileA

Sets owner, group, and other permissions and adds ACL entries for steve and Jason

• Modify an existing ACL by adding a new entry for the group eng:

$ hdfs dfs –setfacl –m group:eng:rw- fileA

• Remove the specific ACL entry for the user jason:

$ hdfs dfs –setfacl –x user:jason fileA

• Remove all ACL entries, leaving only the base owner, group, and other permissions:

$ hdfs dfs –setfacl –b fileA

The user steve and the group eng are removed.

Using setfacl on Directories

The setfacl command can also be used on directories, but with a few differences. For example, the –

R recursive option can be used to set, modify, or remove permissions and ACL entries from an entire

directory hierarchy.

hdfs dfs –setfacl –R … dir1 recursively sets, modifies, or removes ACL entries.

hdfs dfs –setfacl –m default:user:jason:rw- dir1

Any directory with default ACL entries must include default entries for the owner, group, and other user classes.

hdfs dfs –setfacl –m default:mask::r-- dir1 explicitly sets a default mask. A default mask on a directory helps to define the permissions and ACLs inherited by child files and directories. union of the permissions for which includes the unnamed group, and any named users or named groups listed in the ACL.

The other type of mask is the access mask on a file or directory.

hdfs dfs –setfacl-m mask::r- — fileA. The purpose of an access mask is to provide a user with a mechanism to quickly limit or restore the effective permissions for multiple users and groups using a single command . An access mask effects any named users, named groups, or the unnamed group.

hdfs dfsadmin –report

hdfs dfs –du -h

hdfs fsck –files –blocks –locations –racks

hdfs fsck –openforwrite

hdfs fsck –move

hdfs fsck –delete

hdfs fsck /user/root

–files

–blocks

–locations

–racks

hdfs dfsadmin -help.

To transition a NameNode into safemode:

hdfs dfsadmin –safemode enter

• To force a NameNode checkpoint operation that creates both a new fsimage and edits file:

hdfs dfsadmin –saveNamespace

To create only a new edits file:

hdfs dfsadmin –rollEdits

• To exit NameNode safemode:

hdfs dfsadmin –safemode leave

• To download the latest fsimage file (useful for doing remote backups):

hdfs dfsadmin –fetchImage

hdfs dfsamdin –report

Configuring Quotas

You must be an HDFS superuser to administer quotas.

• Setting a name quota on one or more directories:

hdfs dfsadmin –setQuota <n> <directory> [<directory>] …

Issue the command again to modify a name quota.

• Removing a name quota on one or more directories:

hdfs dfsadmin –clrQuota <directory> [<directory>] …

• Setting a space name quota on one or more directories:

hdfs dfsadmin –setSpaceQuota <n> <directory> [<directory>] …

Issue the command again to modify a space quota.

• Removing a space quota on one or more directories:

hdfs dfsadmin –clrSpaceQuota <directory> [<directory>] …

An attempt to set a name or space quota will still succeed even if the directory would be in immediate

violation of the new quota.

hdfs dfs –count –v –q <directory_name> Any user may view current quota information using the HDFS Shell count command.

hdfs balancer

Changing the threshold to 5 percent:

hdfs balancer –threshold 5

• Display other options:

hdfs balancer -help

hdfs dfsadmin –refreshNodes and yarn rmadmin –refreshNodes

hdfs fsck –racks utility displays the number of racks of which the NameNode is aware.

hdfs haadmin –getServiceState.

hdfs haadmin –failover.

hdfs dfsadmin –allowSnapshot <directory_path>.

hdfs dfsadmin –disallowSnapshot <directory_path>.

The hdfs lsSnapshottableDir commands lists any snapshottable directories.

hdfs dfs –renameSnapshot <directory_path> <old_name> <new_name>.

hdfs dfs –createSnapshot <directory_path> [<snapshot_name>].

hdfs dfs –deleteSnapshot <directory_path> <snapshot_name>

hadoop distcp –help

hadoop distcp

hdfs://<namenode1>:8020/<source1> hdfs://<namenode1>:8020/<source2>

hadoop distcp –f hdfs://<namenode1>:8020/<source_list>

hdfs://<namenode2>:8020/<destination>.

Hadoop DistCp -update option

The -update and -overwrite options

distcp command includes a –m <n> option This is the default mode although it can be explicitly specified by adding the –strategy uniformsize option.

Distcp is basically run in static mode and dynamic mode (In static mode the mappers that finished early must wait, With static mode, mappers that are faster finish early and are not assigned any more groups to process. Mappers that are slower still must process all of the files assigned to them. This is the default

mode although it can be explicitly specified by adding the –strategy uniformsize option.)

Hadoop distcp -m 20 –strategy uniformsize

Hadoop distcp -m 20 –strategy dynamic

async Should distcp execution be blocking

-atomic Commit all changes or none

-bandwidth <arg> Specify bandwidth per map in MB

-delete Delete from target, files missing in source

-f <arg> List of files that need to be copied

-filelimit <arg> (Deprecated!) Limit number of files copied to <= n

-i Ignore failures during copy

-log <arg> Folder on DFS where distcp execution logs are

saved

-m <arg> Max number of concurrent maps to use for copy

-mapredSslConf <arg> Configuration for ssl config file, to use with

hftps://

-overwrite Choose to overwrite target files unconditionally,

even if they exist.

-p <arg> preserve status (rbugp)(replication, block-size,

user, group, permission)

-sizelimit <arg> (Deprecated!) Limit number of files copied to <= n

bytes

-skipcrccheck Whether to skip CRC checks between source and

target paths.

-strategy <arg> Copy strategy to use. Default is dividing work

based on file sizes

-tmp <arg> Intermediate work path to be used for atomic

commit

-update Update target, copying only missingfiles or

Directories

Total of 6 NODE cluster details:

NODE1 :HDFS master component ,Zookeeper Sever ,Ambari Agent, Journal node, Resource Manager ,App Timeline Server, History Server, Hiveserver 2

NODE 2:HDFS master component, Zookeeper Server, Ambari Agent, Journal node

NODE 3:Amabri server, zookeeper server, Journal Node, Clients, Hive Metastore, WebHcat Server, clients, Hive server 2, Metrics Collector

NODE 4:Ambari Agent, HDFS Worker component, Node Manager component, Hive Client,Pig

NODE 5:Ambari Agent, HDFS Worker component, Node Manager component, Hive client,Pig

NODE 6:Ambari Agent, HDFS Worker component, Node Manager component, Hive client,Pig

1) If necessary, use Ambari Web UI > Services > ZooKeeper > Service Actions >

Add ZooKeeper Server to add more ZooKeeper servers.(3 servers minimum for Namenode HA configuration)

2) Ambari click Services > HDFS > Service Actions > Enable NameNode HA. This opens a configuration wizard.

In the service action drop down list enable Namenode HA

3) Review:In the Getting Started window, type the Nameservice ID. The Nameservice ID is the logical name of the HDFS cluster.In the wizard you have to enter different properties in the each steps of GUI screens

--- Logical Name (dfs.nameservices)

--fs.defaultFS(In core-site.xml the default path prefix used by the Hadoop FS client when none is given)

---installation (NameNode Current On NODE 1) and (Additional Namenodes on NODE 2)

---(Journal Nodes one on Current Namenode NODE 1), (Second on Additional Namenode NODE 2) ,(third journal node on NODE 3).

---On current journal nodes on their installation paths in hdfs-site.xml set the property

Dfs.journalnode.edits.dir =”/path/to/edits/info/data” where editlogs are stored in the directory paths.

--Locating journal nodes will be set by property in hdfs-site.xml in

Dfs.namenode.shared.edits.dir “qjournal://jn1:8485;jn2:8485;j3:8485”

--dfs.nameservices =”haclustersetup”(The logical hdfs cluster name points to the two namenodes)

-- dfs.ha.namenode.haclustersetup=”nn1,nn2”(names of namenodes)

--dfs.namenode.http-address.<logical clustername>.<names of nodes>

Ex:dfs.namenode.http-address.<haclustersetup>.<nn1>= “node1:50070”

dfs.namenode.http-address.<haclustersetup>.<nn2>= “node2:50070”

--dfs.namenode.rpc-address.<logical clustername>.<name of node>

Ex:dfs.namenode.rpc-address.<haclustersetup>.<nn1>= “node1:8020”

dfs.namenode.rpc-address.<haclustersetup>.<nn2>= “node2:8020”

-- dfs.ha.fencing.methods(values: shell or sshfence)

-- dfs.client.failover.proxy.provider.mycluster property determines the Java class used by the client to determine which NameNode is currently the Active NameNode.”org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider”

4) Manually Create a checkpoint

--sudo su hdfs -1 –c ‘Hdfs dfsadmin safemode enter’

--sudo su hdfs -1 –c ‘hdfs dfsadmin safeNameSpace’

5) Manually initialize the journal nodes

----sudo su hdfs -1 –c ‘hdfs namenode initalizeShareEdits’

6) Manually initialize the metadata for namenode automatic failover by running

--sudo su hdfs -1 –c ‘Hdfs zkfc formatZk’

7) Manually initialize the metadata for the additional namenode by running

--sudo su hdfs -1 –c ‘Hdfs namenode bootstrapStandby’

8) hdfs haadmin –getServiceState.(to get service state)

9) hdfs haadmin –failover (to manual initiate a failover)

hadooop ambari administration

Sunday, May 22, 2016

Hdfs important commands

NAMENODE HA set up