1) The Java native API libraries use RPC over port 8020 while
the WebHDFS REST API uses port 50070 to connect to the NameNode and port 50075
to connect to a DataNode.
2) WebHDFS uses HTTP operations like GET, POST, PUT, and
DELETE for file access and administration.
3) WebHDFS is compatible with Kerberos authentication. It
uses the Simple and Protected
GSSAPI Negotiation Mechanism (SPNEGO), which extends
Kerberos to Web applications.
4) Writing a file is a two-step process.
Create a file by creating a file name on the NameNode:
curl -i -X PUT
"http://<NameNode>:50070/webhdfs/v1/web/mydata/largefile.json?op=CREATE".
The output from this command includes the URL used to
write data to the file.
Write to the file by sending data to
the DataNodes:
curl –i –PUT –T largefile.json
“http://<DataNode>:50075/webhdfs/v1/web/mydata/largefile.json?op=CREATE&u
ser.name=root&namenoderpcaddress=node1:8020&overwrite=false”
The curl command can perform a write
operation using a single command that performs both
steps:
curl –I –X PUT largefile.json –L
“http://<NameNode>:50070/webhdfs/v1/web/mydata/largefile.json?op=CREATE&u
ser.name=root"
5) 8. If Kerberos is enabled, WebHDFS
requires the configuration of two additional hdfs-site.xml
properties.The property names are
dfs.web.authentication.kerberos.principal=”HTTP:/$<FQDN>@$<REALM_NAME>.com”/”
and
dfs.web.authentication.kerberos.keytab.=”
/etc/security/spengo.service.keytab“
6) Reading a file named webdata:
curl -i
-L
"http://<NameNode>:50070/webhdfs/v1/web/mydata/webdata?op=OPEN&user.name=
jason”
7) Creating a directory named mydata:
curl -i -X PUT
"http://<NameNode>:50070/webhdfs/v1/web/mydata?op=MKDIRS&user.name=jason”
Listing a directory named mydata:
curl -i
"http://<NameNode>:50070/webhdfs/v1/web/mydata?op=LISTSTATUS&user.name=ja
son”
WebHDFS Authentication
When security is off (Kerberos
not enabled), the user that is authenticated is the user set in the
user.name=<name>
included in the URL. If user.name
is not included in the URL, the
server may either set the authenticated user to a default Web user,
if there is one, or return an error response.
When security is on (Kerberos
is enabled), authentication is performed by either Hadoop delegation token or Kerberos SPNEGO. The user encoded in the delegation=<token>
argument is authenticated,
or the user is authenticated by SPNEGO.