Contents:
- class pywebhdfs.webhdfs.PyWebHdfsClient(host='localhost', port='50070', user_name=None)¶
PyWebHdfsClient is a Python wrapper for the Hadoop WebHDFS REST API
To use this client:
>>> from pywebhdfs.webhdfs import PyWebHdfsClient
- __init__(host='localhost', port='50070', user_name=None)¶
Create a new client for interacting with WebHDFS
Parameters:
- host – the ip address or hostname of the HDFS namenode
- port – the port number for WebHDFS on the namenode
- user_name – WebHDFS user.name used for authentication
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
- append_file(path, file_data, **kwargs)¶
Appends to an existing file on HDFS
Parameters:
- path – the HDFS file path without a leading ‘/’
- file_data – data to append to existing file
The function wraps the WebHDFS REST call:
POST http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=APPEND
[&buffersize=<INT>]
The function accepts all WebHDFS optional arguments shown above
Example:
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs') >>> my_data = '01010101010101010101010101010101' >>> my_file = 'user/hdfs/data/myfile.txt' >>> hdfs.append_file(my_file, my_data)Example with optional args:
>>> hdfs.append_file(my_file, my_data, overwrite=True, buffersize=4096)Note: The append_file function does not follow automatic redirects but instead uses a two step call to the API as required in the WebHDFS documentation
Append is not supported in Hadoop 1.x
- create_file(path, file_data, **kwargs)¶
Creates a new file on HDFS
Parameters:
- path – the HDFS file path without a leading ‘/’
- file_data – the initial data to write to the new file
The function wraps the WebHDFS REST call:
PUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
[&overwrite=<true|false>][&blocksize=<LONG>][&replication=<SHORT>] [&permission=<OCTAL>][&buffersize=<INT>]
The function accepts all WebHDFS optional arguments shown above
Example:
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs') >>> my_data = '01010101010101010101010101010101' >>> my_file = 'user/hdfs/data/myfile.txt' >>> hdfs.create_file(my_file, my_data)Example with optional args:
>>> hdfs.create_file(my_file, my_data, overwrite=True, blocksize=64)Or for sending data from file like objects:
>>> with open('file.data') as file_data: >>> hdfs.create_file(hdfs_path, data=file_data)Note: The create_file function does not follow automatic redirects but instead uses a two step call to the API as required in the WebHDFS documentation
- delete_file_dir(path, recursive=False)¶
Delete an existing file or directory from HDFS
Parameters: path – the HDFS file path without a leading ‘/’ The function wraps the WebHDFS REST call:
DELETE “http://<host>:<port>/webhdfs/v1/<path>?op=DELETE
[&recursive=<true|false>]
Example for deleting a file:
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs') >>> my_file = 'user/hdfs/data/myfile.txt' >>> hdfs.delete_file_dir(my_file)Example for deleting a directory:
>>> hdfs.delete_file_dir(my_file, recursive=True)
- get_file_dir_status(path)¶
Get the file_status of a single file or directory on HDFS
Parameters: path – the HDFS file path without a leading ‘/’ The function wraps the WebHDFS REST call:
GET http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS
Example for getting file status:
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs') >>> my_file = 'user/hdfs/data/myfile.txt' >>> hdfs.get_file_dir_status(my_file) { "FileStatus":{ "accessTime":1371737704282, "blockSize":134217728, "group":"hdfs", "length":90, "modificationTime":1371737704595, "owner":"hdfs", "pathSuffix":"", "permission":"755", "replication":3, "type":"FILE" } }Example for getting directory status:
>>> my_dir = 'user/hdfs/data/' >>> hdfs.get_file_dir_status(my_file) { "FileStatus":{ "accessTime":0, "blockSize":0, "group":"hdfs", "length":0, "modificationTime":1371737704208, "owner":"hdfs", "pathSuffix":"", "permission":"755", "replication":0, "type":"DIRECTORY" } }
- list_dir(path)¶
Get a list of file_status for all files and directories inside an HDFS directory
Parameters: path – the HDFS file path without a leading ‘/’ The function wraps the WebHDFS REST call:
GET http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS
Example for listing a directory:
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs') >>> my_dir = 'user/hdfs' >>> hdfs.list_dir(my_dir) { "FileStatuses":{ "FileStatus":[ { "accessTime":1371737704282, "blockSize":134217728, "group":"hdfs", "length":90, "modificationTime":1371737704595, "owner":"hdfs", "pathSuffix":"example3.txt", "permission":"755", "replication":3, "type":"FILE" }, { "accessTime":1371678467205, "blockSize":134217728, "group":"hdfs","length":1057, "modificationTime":1371678467394, "owner":"hdfs", "pathSuffix":"example2.txt", "permission":"700", "replication":3, "type":"FILE" } ] } }
- make_dir(path, **kwargs)¶
Create a new directory on HDFS
Parameters: path – the HDFS file path without a leading ‘/’ The function wraps the WebHDFS REST call:
PUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=MKDIRS
[&permission=<OCTAL>]
Example:
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs') >>> my_dir = 'user/hdfs/data/new_dir' >>> hdfs.make_dir(my_dir)Example with optional args:
>>> hdfs.make_dir(my_dir, permission=755)
- read_file(path, **kwargs)¶
Reads from a file on HDFS and returns the content
Parameters: path – the HDFS file path without a leading ‘/’ The function wraps the WebHDFS REST call:
GET http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN
[&offset=<LONG>][&length=<LONG>][&buffersize=<INT>]
Note: this function follows automatic redirects
Example:
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs') >>> my_file = 'user/hdfs/data/myfile.txt' >>> hdfs.read_file(my_file) 01010101010101010101010101010101 01010101010101010101010101010101 01010101010101010101010101010101 01010101010101010101010101010101
- rename_file_dir(path, destination_path)¶
Rename an existing directory or file on HDFS
Parameters:
- path – the HDFS file path without a leading ‘/’
- destination_path – the new file path name
The function wraps the WebHDFS REST call:
PUT <HOST>:<PORT>/webhdfs/v1/<PATH>?op=RENAME&destination=<PATH>
Example:
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs') >>> current_dir = 'user/hdfs/data/my_dir' >>> destination_dir = 'user/hdfs/data/renamed_dir' >>> hdfs.rename_file_dir(current_dir, destination_dir)