Contents:
Python wrapper around Hadoop streaming jar.
HadoopManager is a central object for managing hadoop jobs and hdfs
In order to perform proper temporary directory cleanup use HadoopManager with ‘with’ statement. with HadoopManager(...) as manager:
pass
| Parameters: |
|
|---|
Create HadoopJob object
| Parameters: |
|
|---|
HadoopFs object for managing hdfs
HadoopJob object for managing mapreduce jobs Create it with the HadoopManager.create_job methos
Returns a generator over mapreduce output
Returns path to the output file. Usefull when temporary dir is used
Remove output dir
Run a mapreduce job
Run a mapreduce job in background Returns a HadoopCmdPromise object
Returns a generator over files defined by path
| Parameters: |
|
|---|
Check if file on the path exists
| Parameters: | path – path to the file |
|---|
Lists files on the path
| Parameters: |
|
|---|
Recursively remove all files on the path
| Parameters: | path – path to the files |
|---|
Override this method to return a compiled regex, string or list of strings(matched with or) that each mapped line must match
Override this methos for mapping the input line Output can be either returned or yielded as a key, value pair
| Parameters: | line – one line of the input file serialized by the input serializer |
|---|
Override this methos for reducing the input Output can be either returned or yielded as a key, value pair
| Parameters: |
|
|---|
Override this methos for reducing the input Output can be either returned or yielded as a key, value pair
| Parameters: |
|
|---|