hdfs dfs –ls file:///bin
hdfs dfs –ls hdfs:///root
file://,
swift://, or s3://
hdfs dfs –help
hdfs dfs –mkdir
/user/steve/dir1
List the contents of a directory recursively:
hdfs dfs –ls –R /user/steve
hdfs dfs –put
hdfs dfs –appendToFile
hdfs dfs –cat fileA
View only the last 1 KB of a file:
hdfs dfs –tail fileB
hdfs dfs –get
/user/steve/fileB /home/steve/fileB
hdfs dfs
–mv /user/steve/fileC /user/steve/dir1/fileC
hdfs dfs
–mv /user/steve/fileC /user/steve/dir1/fileC
hdfs dfs
–getmerge fileF fileG
hdfs dfs
–rm fileB fileC
hdfs dfs
–rmdir /user/steve/dir1/dir2
Trash is configured by two properties, fs.trash.checkpoint.interval
in core-default.xml
and fs.trash.interval
in core-site.xml.
hdfs dfs
–D dfs.blocksize=<N> -D dfs.replication=5
–put
hdfs dfs
–chown danielle /data/weblogs/fileA
hdfs dfs
–chgrp hdfs fileB
Change owner and group membership simultaneously:
hdfs dfs
–chown hcat:hdfs /data/weblogs/fileC
The
command requires HDFS superuser privileges.
Using setfacl on Files
Set (remove existing and
replace) both permissions and ACL entries using a single command:
$ hdfs dfs –setfacl --set
user::rw-,group::r--,other::---,user:steve:rw-
,user:jason:rw- fileA
Sets owner, group, and other permissions and
adds ACL entries for steve and Jason
Modify an existing ACL
by adding a new entry for the group eng:
$ hdfs dfs –setfacl –m group:eng:rw- fileA
Remove the specific ACL
entry for the user jason:
$ hdfs dfs –setfacl –x user:jason fileA
Remove all ACL entries,
leaving only the base owner, group, and other permissions:
$ hdfs dfs –setfacl –b fileA
The user steve and the group eng are removed.
Using setfacl on Directories
The setfacl
command can also be used
on directories, but with a few differences. For example, the –
R recursive option can be
used to set, modify, or remove permissions and ACL entries from an entire
directory hierarchy.
hdfs dfs
–setfacl –R … dir1 recursively sets, modifies, or removes ACL entries.
hdfs dfs
–setfacl –m default:user:jason:rw- dir1
Any directory with default ACL entries must include
default entries for the owner, group, and other user classes.
hdfs dfs
–setfacl –m default:mask::r-- dir1 explicitly
sets a default mask. A default mask on a directory helps to define the
permissions and ACLs inherited by child files and directories. union of the
permissions for which includes the unnamed group, and any named users or named
groups listed in the ACL.
The other type of mask is the access mask on a file or
directory.
hdfs dfs
–setfacl-m mask::r- — fileA. The
purpose of an access mask is to provide a user with a mechanism to quickly
limit or restore the effective permissions for multiple users and groups using a
single command . An access mask effects any named users, named groups, or the
unnamed group.
hdfs
dfsadmin –report
hdfs dfs
–du -h
hdfs
fsck –files –blocks –locations –racks
hdfs
fsck –openforwrite
hdfs
fsck –move
hdfs
fsck –delete
hdfs
fsck /user/root
–files
–blocks
–locations
–racks
hdfs
dfsadmin -help.
To transition a NameNode into safemode:
hdfs
dfsadmin –safemode enter
To force a NameNode checkpoint
operation that creates both a new fsimage and edits file:
hdfs
dfsadmin –saveNamespace
To create only a new edits file:
hdfs
dfsadmin –rollEdits
To exit NameNode safemode:
hdfs
dfsadmin –safemode leave
To download the latest fsimage
file (useful for doing remote
backups):
hdfs
dfsadmin –fetchImage
hdfs
dfsamdin –report
Configuring Quotas
You must be an HDFS superuser to administer quotas.
Setting a name quota on one or more
directories:
hdfs
dfsadmin –setQuota <n> <directory> [<directory>] …
Issue the command again to modify a name quota.
Removing a name quota on one or more
directories:
hdfs
dfsadmin –clrQuota <directory> [<directory>] …
Setting a space name quota on one or
more directories:
hdfs
dfsadmin –setSpaceQuota <n> <directory> [<directory>] …
Issue the command again to modify a space quota.
Removing a space quota on one or
more directories:
hdfs
dfsadmin –clrSpaceQuota <directory> [<directory>] …
An attempt to set a name or space quota will still
succeed even if the directory would be in immediate
violation of the new quota.
hdfs dfs
–count –v –q <directory_name>
Any user may view current quota information using the HDFS Shell count
command.
hdfs
balancer
Changing the threshold to 5 percent:
hdfs
balancer –threshold 5
Display other options:
hdfs
balancer -help
hdfs
dfsadmin –refreshNodes and yarn
rmadmin –refreshNodes
hdfs
fsck –racks utility displays the number of racks
of which the NameNode is aware.
hdfs
haadmin –getServiceState.
hdfs
haadmin –failover.
hdfs
dfsadmin –allowSnapshot <directory_path>.
hdfs
dfsadmin –disallowSnapshot <directory_path>.
The hdfs lsSnapshottableDir commands lists any snapshottable
directories.
hdfs dfs
–renameSnapshot <directory_path> <old_name> <new_name>.
hdfs dfs
–createSnapshot <directory_path> [<snapshot_name>].
hdfs dfs
–deleteSnapshot <directory_path> <snapshot_name>
hadoop
distcp –help
hadoop
distcp
hdfs://<namenode1>:8020/<source1>
hdfs://<namenode1>:8020/<source2>
hadoop
distcp –f hdfs://<namenode1>:8020/<source_list>
hdfs://<namenode2>:8020/<destination>.
Hadoop DistCp -update option
The -update and -overwrite
options
distcp command includes a –m
<n> option This is the default mode
although it can be explicitly specified by adding the –strategy
uniformsize option.
Distcp is basically run in static mode and dynamic mode
(In static mode the mappers that finished early must wait, With static mode,
mappers that are faster finish early and are not assigned any more groups to process. Mappers that are slower still must process all
of the files assigned to them. This is the default
mode although it can be explicitly specified by adding
the –strategy uniformsize option.)
Hadoop distcp -m
20 –strategy uniformsize
Hadoop distcp -m
20 –strategy dynamic
async Should distcp execution be
blocking
-atomic Commit all changes or none
-bandwidth <arg> Specify bandwidth per map in MB
-delete Delete from target, files
missing in source
-f <arg> List of files that need to be
copied
-filelimit <arg> (Deprecated!) Limit number of files copied
to <= n
-i Ignore failures during
copy
-log <arg> Folder on DFS where distcp
execution logs are
saved
-m <arg> Max number of concurrent maps to
use for copy
-mapredSslConf <arg> Configuration for ssl config file, to use
with
hftps://
-overwrite Choose to overwrite target files
unconditionally,
even if they exist.
-p <arg> preserve status (rbugp)(replication,
block-size,
user, group,
permission)
-sizelimit <arg> (Deprecated!) Limit number of files
copied to <= n
bytes
-skipcrccheck Whether to skip CRC checks between
source and
target paths.
-strategy <arg> Copy strategy to use. Default is
dividing work
based on file sizes
-tmp <arg> Intermediate work path to be used
for atomic
commit
-update Update target, copying only
missingfiles or
Directories