pywebhdfs 0.2.2 documentation

Contents:

class pywebhdfs.webhdfs.PyWebHdfsClient(host='localhost', port='50070', user_name=None)

PyWebHdfsClient is a Python wrapper for the Hadoop WebHDFS REST API

To use this client:

>>> from pywebhdfs.webhdfs import PyWebHdfsClient
__init__(host='localhost', port='50070', user_name=None)

Create a new client for interacting with WebHDFS

Parameters:
  • host – the ip address or hostname of the HDFS namenode
  • port – the port number for WebHDFS on the namenode
  • user_name – WebHDFS user.name used for authentication
>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
append_file(path, file_data, **kwargs)

Appends to an existing file on HDFS

Parameters:
  • path – the HDFS file path without a leading ‘/’
  • file_data – data to append to existing file

The function wraps the WebHDFS REST call:

POST http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=APPEND

[&buffersize=<INT>]

The function accepts all WebHDFS optional arguments shown above

Example:

>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
>>> my_data = '01010101010101010101010101010101'
>>> my_file = 'user/hdfs/data/myfile.txt'
>>> hdfs.append_file(my_file, my_data)

Example with optional args:

>>> hdfs.append_file(my_file, my_data, overwrite=True, buffersize=4096)

Note: The append_file function does not follow automatic redirects but instead uses a two step call to the API as required in the WebHDFS documentation

Append is not supported in Hadoop 1.x

create_file(path, file_data, **kwargs)

Creates a new file on HDFS

Parameters:
  • path – the HDFS file path without a leading ‘/’
  • file_data – the initial data to write to the new file

The function wraps the WebHDFS REST call:

PUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE

[&overwrite=<true|false>][&blocksize=<LONG>][&replication=<SHORT>] [&permission=<OCTAL>][&buffersize=<INT>]

The function accepts all WebHDFS optional arguments shown above

Example:

>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
>>> my_data = '01010101010101010101010101010101'
>>> my_file = 'user/hdfs/data/myfile.txt'
>>> hdfs.create_file(my_file, my_data)

Example with optional args:

>>> hdfs.create_file(my_file, my_data, overwrite=True, blocksize=64)

Or for sending data from file like objects:

>>> with open('file.data') as file_data:
>>>     hdfs.create_file(hdfs_path, data=file_data)

Note: The create_file function does not follow automatic redirects but instead uses a two step call to the API as required in the WebHDFS documentation

delete_file_dir(path, recursive=False)

Delete an existing file or directory from HDFS

Parameters:path – the HDFS file path without a leading ‘/’

The function wraps the WebHDFS REST call:

DELETE “http://<host>:<port>/webhdfs/v1/<path>?op=DELETE

[&recursive=<true|false>]

Example for deleting a file:

>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
>>> my_file = 'user/hdfs/data/myfile.txt'
>>> hdfs.delete_file_dir(my_file)

Example for deleting a directory:

>>> hdfs.delete_file_dir(my_file, recursive=True)
get_file_dir_status(path)

Get the file_status of a single file or directory on HDFS

Parameters:path – the HDFS file path without a leading ‘/’

The function wraps the WebHDFS REST call:

GET http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS

Example for getting file status:

>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
>>> my_file = 'user/hdfs/data/myfile.txt'
>>> hdfs.get_file_dir_status(my_file)
{
    "FileStatus":{
        "accessTime":1371737704282,
        "blockSize":134217728,
        "group":"hdfs",
        "length":90,
        "modificationTime":1371737704595,
        "owner":"hdfs",
        "pathSuffix":"",
        "permission":"755",
        "replication":3,
        "type":"FILE"
    }
}

Example for getting directory status:

>>> my_dir = 'user/hdfs/data/'
>>> hdfs.get_file_dir_status(my_file)
{
    "FileStatus":{
        "accessTime":0,
        "blockSize":0,
        "group":"hdfs",
        "length":0,
        "modificationTime":1371737704208,
        "owner":"hdfs",
        "pathSuffix":"",
        "permission":"755",
        "replication":0,
        "type":"DIRECTORY"
    }
}
list_dir(path)

Get a list of file_status for all files and directories inside an HDFS directory

Parameters:path – the HDFS file path without a leading ‘/’

The function wraps the WebHDFS REST call:

GET http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS

Example for listing a directory:

>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
>>> my_dir = 'user/hdfs'
>>> hdfs.list_dir(my_dir)
{
    "FileStatuses":{
        "FileStatus":[
            {
                "accessTime":1371737704282,
                "blockSize":134217728,
                "group":"hdfs",
                "length":90,
                "modificationTime":1371737704595,
                "owner":"hdfs",
                "pathSuffix":"example3.txt",
                "permission":"755",
                "replication":3,
                "type":"FILE"
            },
            {
                "accessTime":1371678467205,
                "blockSize":134217728,
                "group":"hdfs","length":1057,
                "modificationTime":1371678467394,
                "owner":"hdfs",
                "pathSuffix":"example2.txt",
                "permission":"700",
                "replication":3,
                "type":"FILE"
            }
        ]
    }
}
make_dir(path, **kwargs)

Create a new directory on HDFS

Parameters:path – the HDFS file path without a leading ‘/’

The function wraps the WebHDFS REST call:

PUT http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=MKDIRS

[&permission=<OCTAL>]

Example:

>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
>>> my_dir = 'user/hdfs/data/new_dir'
>>> hdfs.make_dir(my_dir)

Example with optional args:

>>> hdfs.make_dir(my_dir, permission=755)
read_file(path, **kwargs)

Reads from a file on HDFS and returns the content

Parameters:path – the HDFS file path without a leading ‘/’

The function wraps the WebHDFS REST call:

GET http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN

[&offset=<LONG>][&length=<LONG>][&buffersize=<INT>]

Note: this function follows automatic redirects

Example:

>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
>>> my_file = 'user/hdfs/data/myfile.txt'
>>> hdfs.read_file(my_file)
01010101010101010101010101010101
01010101010101010101010101010101
01010101010101010101010101010101
01010101010101010101010101010101
rename_file_dir(path, destination_path)

Rename an existing directory or file on HDFS

Parameters:
  • path – the HDFS file path without a leading ‘/’
  • destination_path – the new file path name

The function wraps the WebHDFS REST call:

PUT <HOST>:<PORT>/webhdfs/v1/<PATH>?op=RENAME&destination=<PATH>

Example:

>>> hdfs = PyWebHdfsClient(host='host',port='50070', user_name='hdfs')
>>> current_dir = 'user/hdfs/data/my_dir'
>>> destination_dir = 'user/hdfs/data/renamed_dir'
>>> hdfs.rename_file_dir(current_dir, destination_dir)

This Page