mrjob.hadoop - run on your Hadoop cluster

class mrjob.hadoop.HadoopJobRunner(**kwargs)

Runs an MRJob on your Hadoop cluster.

Input and support files can be either local or on HDFS; use hdfs://... URLs to refer to files on HDFS.

HadoopJobRunner.__init__(**kwargs)

HadoopJobRunner takes the same arguments as MRJobRunner, plus some additional options which can be defaulted in mrjob.conf.

Utilities

mrjob.hadoop.hadoop_log_dir(hadoop_home=None)

Return the path where Hadoop stores logs.

Parameters:hadoop_home – putative value of HADOOP_HOME, or None to default to the actual value if used. This is only used if HADOOP_LOG_DIR is not defined.
mrjob.hadoop.find_hadoop_streaming_jar(path)

Return the path of the hadoop streaming jar inside the given directory tree, or None if we can’t find it.

mrjob.hadoop.fully_qualify_hdfs_path(path)

If path isn’t an hdfs:// URL, turn it into one.

Table Of Contents

Previous topic

mrjob.emr - run on EMR

Next topic

mrjob.inline - alternate (debugger-friendly) local testing

This Page