mrjob.compat - Hadoop version compatibility

Utility functions for compatibility with different version of hadoop.

mrjob.compat.add_translated_jobconf_for_hadoop_version(jobconf, hadoop_version)

Translates the configuration property name to match those that are accepted in hadoop_version. Prints a warning message if any configuration property name does not match the name in the hadoop version. Combines the original jobconf with the translated jobconf.

Returns:a map consisting of the original and translated configuration property names and values.
mrjob.compat.get_jobconf_value(variable, default=None)

Get the value of a jobconf variable from the runtime environment.

For example, a MRJob could use jobconf_from_env('map.input.file') to get the name of the file a mapper is reading input from.

If the name of the jobconf variable is different in different versions of Hadoop (e.g. in Hadoop 0.21, map.input.file is mapreduce.map.input.file), we’ll automatically try all variants before giving up.

Return default if that jobconf variable isn’t set.

mrjob.compat.jobconf_from_dict(jobconf, name, default=None)

Get the value of a jobconf variable from the given dictionary.

Parameters:
  • jobconf (dict) – jobconf dictionary
  • name (string) – name of the jobconf variable (e.g. 'user.name')
  • default – fallback value

If the name of the jobconf variable is different in different versions of Hadoop (e.g. in Hadoop 0.21, map.input.file is mapreduce.map.input.file), we’ll automatically try all variants before giving up.

Return default if that jobconf variable isn’t set.

mrjob.compat.jobconf_from_env(variable, default=None)

Get the value of a jobconf variable from the runtime environment.

For example, a MRJob could use jobconf_from_env('map.input.file') to get the name of the file a mapper is reading input from.

If the name of the jobconf variable is different in different versions of Hadoop (e.g. in Hadoop 0.21, map.input.file is mapreduce.map.input.file), we’ll automatically try all variants before giving up.

Return default if that jobconf variable isn’t set.

mrjob.compat.supports_combiners_in_hadoop_streaming(version)

Return True if this version of Hadoop Streaming supports combiners (i.e. >= 0.20.203), otherwise False.

mrjob.compat.supports_new_distributed_cache_options(version)

Use -files and -archives instead of -cacheFile and -cacheArchive

mrjob.compat.translate_jobconf(variable, version)

Translate variable to Hadoop version version. If it’s not a variable we recognize, leave as-is.

mrjob.compat.uses_generic_jobconf(version)

Use -D instead of -jobconf

mrjob.compat.version_gte(version, cmp_version_str)

Return True if version >= cmp_version_str.

Need help?

Join the mailing list by visiting the Google group page or sending an email to mrjob+subscribe@googlegroups.com.