mrjob.compat - Hadoop version compatibility

Utility functions for compatibility with different version of hadoop.

mrjob.compat.add_translated_jobconf_for_hadoop_version(jobconf, hadoop_version)

Translates the configuration property name to match those that are accepted in hadoop_version. Prints a warning message if any configuration property name does not match the name in the hadoop version. Combines the original jobconf with the translated jobconf.

Returns:a map consisting of the original and translated configuration

property names and values.

mrjob.compat.get_jobconf_value(variable, default=None)

Get the value of a jobconf variable from the runtime environment.

For example, a MRJob could use get_jobconf_value('map.input.file') to get the name of the file a mapper is reading input from.

If the name of the jobconf variable is different in different versions of Hadoop (e.g. in Hadoop 0.21, map.input.file is mapreduce.map.input.file), we’ll automatically try all variants before giving up.

Return default if that jobconf variable isn’t set.

mrjob.compat.supports_combiners_in_hadoop_streaming(version)

Return True if this version of Hadoop Streaming supports combiners (i.e. >= 0.20.203), otherwise False.

mrjob.compat.supports_new_distributed_cache_options(version)

Use -files and -archives instead of -cacheFile and -cacheArchive

mrjob.compat.translate_jobconf(variable, version)

Translate variable to Hadoop version version. If it’s not a variable we recognize, leave as-is.

mrjob.compat.uses_generic_jobconf(version)

Use -D instead of -jobconf

mrjob.compat.version_gte(version, cmp_version_str)

Return True if version >= cmp_version_str.

Previous topic

Reference

Next topic

mrjob.conf - parse and write config files

This Page