Showing posts with label Webhdfs connection using curl commands using REST api commands. Show all posts
Showing posts with label Webhdfs connection using curl commands using REST api commands. Show all posts

Sunday, May 22, 2016

WebHDFS information



1) The Java native API libraries use RPC over port 8020 while the WebHDFS REST API uses port 50070 to connect to the NameNode and port 50075 to connect to a DataNode.



2) WebHDFS uses HTTP operations like GET, POST, PUT, and DELETE for file access and administration.



3) WebHDFS is compatible with Kerberos authentication. It uses the Simple and Protected

GSSAPI Negotiation Mechanism (SPNEGO), which extends Kerberos to Web applications.



4) Writing a file is a two-step process.

Create a file by creating a file name on the NameNode:

curl -i -X PUT

"http://<NameNode>:50070/webhdfs/v1/web/mydata/largefile.json?op=CREATE".

The output from this command includes the URL used to write data to the file.



• Write to the file by sending data to the DataNodes:

curl –i –PUT –T largefile.json

“http://<DataNode>:50075/webhdfs/v1/web/mydata/largefile.json?op=CREATE&u

ser.name=root&namenoderpcaddress=node1:8020&overwrite=false”



• The curl command can perform a write operation using a single command that performs both

steps:

curl –I –X PUT largefile.json –L

“http://<NameNode>:50070/webhdfs/v1/web/mydata/largefile.json?op=CREATE&u

ser.name=root"



5) 8. If Kerberos is enabled, WebHDFS requires the configuration of two additional hdfs-site.xml

properties.The property names are

dfs.web.authentication.kerberos.principal=”HTTP:/$<FQDN>@$<REALM_NAME>.com”/” and

dfs.web.authentication.kerberos.keytab.=” /etc/security/spengo.service.keytab“

6) Reading a file named webdata:

curl -i -L

"http://<NameNode>:50070/webhdfs/v1/web/mydata/webdata?op=OPEN&user.name=

jason”

7) Creating a directory named mydata:

curl -i -X PUT

"http://<NameNode>:50070/webhdfs/v1/web/mydata?op=MKDIRS&user.name=jason”

• Listing a directory named mydata:

curl -i

"http://<NameNode>:50070/webhdfs/v1/web/mydata?op=LISTSTATUS&user.name=ja

son”

WebHDFS Authentication

When security is off (Kerberos not enabled), the user that is authenticated is the user set in the

user.name=<name> included in the URL. If user.name is not included in the URL, the server may either set the authenticated user to a default Web user, if there is one, or return an error response.

When security is on (Kerberos is enabled), authentication is performed by either Hadoop delegation token or Kerberos SPNEGO. The user encoded in the delegation=<token> argument is authenticated, or the user is authenticated by SPNEGO.