Showing posts with label hdfs dfs. Show all posts
Showing posts with label hdfs dfs. Show all posts

Sunday, May 22, 2016

Hdfs important commands



hdfs dfs –ls file:///bin

hdfs dfs –ls hdfs:///root

file://, swift://, or s3://

hdfs dfs –help

hdfs dfs –mkdir /user/steve/dir1

List the contents of a directory recursively:

hdfs dfs –ls –R /user/steve

hdfs dfs –put

hdfs dfs –appendToFile

hdfs dfs –cat fileA

View only the last 1 KB of a file:

hdfs dfs –tail fileB

hdfs dfs –get /user/steve/fileB /home/steve/fileB

hdfs dfs –mv /user/steve/fileC /user/steve/dir1/fileC



hdfs dfs –mv /user/steve/fileC /user/steve/dir1/fileC



hdfs dfs –getmerge fileF fileG



hdfs dfs –rm fileB fileC



hdfs dfs –rmdir /user/steve/dir1/dir2



Trash is configured by two properties, fs.trash.checkpoint.interval in core-default.xml and fs.trash.interval in core-site.xml.



hdfs dfs –D dfs.blocksize=<N>  -D dfs.replication=5 –put



hdfs dfs –chown danielle /data/weblogs/fileA



hdfs dfs –chgrp hdfs fileB



Change owner and group membership simultaneously:

hdfs dfs –chown hcat:hdfs /data/weblogs/fileC

The command requires HDFS superuser privileges.

Using setfacl on Files

Set (remove existing and replace) both permissions and ACL entries using a single command:

$ hdfs dfs –setfacl --set user::rw-,group::r--,other::---,user:steve:rw-

,user:jason:rw- fileA

Sets owner, group, and other permissions and adds ACL entries for steve and Jason



Modify an existing ACL by adding a new entry for the group eng:

$ hdfs dfs –setfacl –m group:eng:rw- fileA



Remove the specific ACL entry for the user jason:

$ hdfs dfs –setfacl –x user:jason fileA



Remove all ACL entries, leaving only the base owner, group, and other permissions:

$ hdfs dfs –setfacl –b fileA

The user steve and the group eng are removed.



Using setfacl on Directories

The setfacl command can also be used on directories, but with a few differences. For example, the

R recursive option can be used to set, modify, or remove permissions and ACL entries from an entire

directory hierarchy.

hdfs dfs –setfacl –R … dir1 recursively sets, modifies, or removes ACL entries.

hdfs dfs –setfacl –m default:user:jason:rw- dir1

Any directory with default ACL entries must include default entries for the owner, group, and other user classes.



hdfs dfs –setfacl –m default:mask::r-- dir1 explicitly sets a default mask. A default mask on a directory helps to define the permissions and ACLs inherited by child files and directories. union of the permissions for which includes the unnamed group, and any named users or named groups listed in the ACL.



The other type of mask is the access mask on a file or directory.

hdfs dfs –setfacl-m mask::r- — fileA. The purpose of an access mask is to provide a user with a mechanism to quickly limit or restore the effective permissions for multiple users and groups using a single command . An access mask effects any named users, named groups, or the unnamed group.



hdfs dfsadmin –report



hdfs dfs –du -h



hdfs fsck –files –blocks –locations –racks



hdfs fsck –openforwrite



hdfs fsck –move



hdfs fsck –delete



hdfs fsck /user/root 

–files

–blocks

–locations

–racks



hdfs dfsadmin -help.



To transition a NameNode into safemode:

hdfs dfsadmin –safemode enter



To force a NameNode checkpoint operation that creates both a new fsimage and edits file:

hdfs dfsadmin –saveNamespace



To create only a new edits file:

hdfs dfsadmin –rollEdits



To exit NameNode safemode:

hdfs dfsadmin –safemode leave



To download the latest fsimage file (useful for doing remote backups):

hdfs dfsadmin –fetchImage



hdfs dfsamdin –report



Configuring Quotas

You must be an HDFS superuser to administer quotas.

Setting a name quota on one or more directories:

hdfs dfsadmin –setQuota <n> <directory> [<directory>] …



Issue the command again to modify a name quota.

Removing a name quota on one or more directories:

hdfs dfsadmin –clrQuota <directory> [<directory>] …



Setting a space name quota on one or more directories:

hdfs dfsadmin –setSpaceQuota <n> <directory> [<directory>] …

Issue the command again to modify a space quota.



Removing a space quota on one or more directories:

hdfs dfsadmin –clrSpaceQuota <directory> [<directory>] …

An attempt to set a name or space quota will still succeed even if the directory would be in immediate

violation of the new quota.



hdfs dfs –count –v –q <directory_name> Any user may view current quota information using the HDFS Shell count command.



hdfs balancer



Changing the threshold to 5 percent:

hdfs balancer –threshold 5



Display other options:

hdfs balancer -help



hdfs dfsadmin –refreshNodes and yarn rmadmin –refreshNodes



hdfs fsck –racks utility displays the number of racks of which the NameNode is aware.



hdfs haadmin –getServiceState.



hdfs haadmin –failover.



hdfs dfsadmin –allowSnapshot <directory_path>.



hdfs dfsadmin –disallowSnapshot <directory_path>.



The hdfs lsSnapshottableDir commands lists any snapshottable directories.



hdfs dfs –renameSnapshot <directory_path> <old_name> <new_name>.



hdfs dfs –createSnapshot <directory_path> [<snapshot_name>].



hdfs dfs –deleteSnapshot <directory_path> <snapshot_name>



hadoop distcp –help

hadoop distcp

hdfs://<namenode1>:8020/<source1> hdfs://<namenode1>:8020/<source2>



hadoop distcp –f hdfs://<namenode1>:8020/<source_list>

hdfs://<namenode2>:8020/<destination>.



Hadoop DistCp -update option



The -update and -overwrite options



distcp command includes a –m <n> option This is the default mode although it can be explicitly specified by adding the –strategy uniformsize option.



Distcp is basically run in static mode and dynamic mode (In static mode the mappers that finished early must wait, With static mode, mappers that are faster finish early and are not assigned any more groups to process. Mappers that are slower still must process all of the files assigned to them. This is the default

mode although it can be explicitly specified by adding the –strategy uniformsize option.)



Hadoop distcp  -m 20 –strategy uniformsize



Hadoop distcp  -m 20 –strategy dynamic



  async                 Should distcp execution be blocking

 -atomic                Commit all changes or none

 -bandwidth <arg>       Specify bandwidth per map in MB

 -delete                Delete from target, files missing in source

 -f <arg>               List of files that need to be copied

 -filelimit <arg>       (Deprecated!) Limit number of files copied to <= n

 -i                     Ignore failures during copy

 -log <arg>             Folder on DFS where distcp execution logs are

                        saved

 -m <arg>               Max number of concurrent maps to use for copy

 -mapredSslConf <arg>   Configuration for ssl config file, to use with

                        hftps://

 -overwrite             Choose to overwrite target files unconditionally,

                        even if they exist.

 -p <arg>               preserve status (rbugp)(replication, block-size,

                        user, group, permission)

 -sizelimit <arg>       (Deprecated!) Limit number of files copied to <= n

                        bytes

 -skipcrccheck          Whether to skip CRC checks between source and

                        target paths.

 -strategy <arg>        Copy strategy to use. Default is dividing work

                        based on file sizes

 -tmp <arg>             Intermediate work path to be used for atomic

                        commit

 -update                Update target, copying only missingfiles or

                        Directories