| Option | Default | Switches |
|---|---|---|
| conf_path | (automatic; see find_mrjob_conf()) | -c, --conf-path, --no-conf |
| extra_args | [] | (see add_passthrough_option()) |
| file_upload_args | [] | (see add_file_option()) |
| hadoop_input_format | None | (see hadoop_input_format()) |
| hadoop_output_format | None | (see hadoop_output_format()) |
| output_dir | (automatic) | -o, --output-dir |
| no_output | False | --no-output |
| partitioner | None | --partitioner (see also partitioner()) |
See mrjob.runner.MRJobRunner.__init__() for details.
| Option | Default | Combined by | Switches |
|---|---|---|---|
| base_tmp_dir | (automatic) | combine_paths() | (set TMPDIR) |
| bootstrap_mrjob | True | combine_values() | --boostrap-mrjob, --no-bootstrap-mrjob |
| cleanup | 'ALL' | combine_values() | --cleanup |
| cleanup_on_failure | 'NONE' | combine_values() | --cleanup-on-failure |
| cmdenv | {} | combine_envs() | --cmdenv |
| hadoop_extra_args | [] | combine_lists() | --hadoop-arg |
| hadoop_streaming_jar | (automatic) | combine_values() | --hadoop-streaming-jar |
| interpreter | (value of python_bin) | combine_cmds() | --interpreter |
| jobconf | {} | combine_dicts() | --jobconf (see also jobconf()) |
| label | (automatic) | combine_values() | --label |
| owner | (automatic) | combine_values() | --owner |
| python_archives | [] | combine_path_lists() | --python-archive |
| python_bin | python | combine_cmds() | --python-bin |
| setup_cmds | [] | combine_lists() | --setup-cmd |
| setup_scripts | [] | combine_path_lists() | --setup-script |
| steps_python_bin | (current Python interpreter) | combine_cmds() | --steps-python-bin |
| upload_archives | [] | combine_path_lists() | --archive |
| upload_files | [] | combine_path_lists() | --file |
See mrjob.runner.MRJobRunner.__init__() for details.
LocalMRJobRunner takes no additional options, but:
In addition, it ignores hadoop_input_format, hadoop_output_format, hadoop_streaming_jar, and jobconf
InlineMRJobRunner works like LocalMRJobRunner, only it also ignores bootstrap_mrjob, cmdenv, python_bin, setup_cmds, setup_scripts, steps_python_bin, upload_archives, and upload_files.
| Option | Default | Combined by | Switches |
|---|---|---|---|
| additional_emr_info | None | combine_values() | --additional-emr-info |
| ami_version | None | combine_values() | --ami-version |
| aws_access_key_id | (automatic) | combine_values() | (set AWS_ACCESS_KEY_ID) |
| aws_availability_zone | (automatic) | combine_values() | --aws-availability-zone |
| aws_region | (automatic) | combine_values() | --aws-region |
| aws_secret_access_key | (automatic) | combine_values() | (set AWS_SECRET_ACCESS_KEY) |
| bootstrap_actions | [] | combine_lists() | --bootstrap-action |
| bootstrap_cmds | [] | combine_lists() | --bootstrap-cmd |
| bootstrap_files | [] | combine_path_lists() | --bootstrap-file |
| bootstrap_python_packages | [] | combine_path_lists() | --bootstrap-python-package |
| bootstrap_scripts | [] | combine_lists() | --bootstrap-script |
| check_emr_status_every | 30 | combine_values() | --check-emr-status-every |
| ec2_core_instance_bid_price | None | combine_values() | --ec2-core-instance-bid-price |
| ec2_core_instance_type | 'm1.small' | combine_values() | --ec2-core-instance-type |
| ec2_instance_type | (effectively m1.small) | combine_values() | --ec2-instance-type |
| ec2_key_pair | None | combine_values() | --ec2-key-pair |
| ec2_key_pair_file | None | combine_paths() | --ec2-key-pair-file |
| ec2_master_instance_bid_price | None | combine_values() | --ec2-master-instance-bid-price |
| ec2_master_instance_type | 'm1.small' | combine_values() | --ec2-master-instance-type |
| ec2_slave_instance_type | (see ec2_core_instance_type) | combine_values() | --ec2-slave-instance-type |
| ec2_task_instance_bid_price | None | combine_values() | --ec2-task-instance-bid-price |
| ec2_task_instance_type | (effectively 'm1.small') | combine_values() | --ec2-task-instance-type |
| emr_endpoint | (automatic) | combine_values() | --emr-endpoint |
| emr_job_flow_id | (create our own job flow) | combine_values() | --emr-job-flow-id |
| emr_job_flow_pool_name | 'default' | combine_values() | --pool-name |
| enable_emr_debugging | False | combine_values() | --enable-emr-debugging, --disable-emr-debugging |
| hadoop_streaming_jar_on_emr | None | combine_values() | --hadoop-streaming-jar-on-emr |
| hadoop_version | '0.20' | combine_values() | --hadoop-version |
| num_ec2_core_instances | None | combine_values() | --num-ec2-core-instances |
| num_ec2_instances | 1 | combine_values() | --num-ec2-instances |
| num_ec2_task_instances | None | combine_values() | --num-ec2-task-instances |
| pool_emr_job_flows | False | combine_values() | --pool-emr-job-flows, --no-pool-emr-job-flows |
| pool_wait_minutes | 0 | combine_values() | --pool-wait-minutes |
| s3_endpoint | (automatic) | combine_paths() | --s3-endpoint |
| s3_log_uri | (automatic) | combine_paths() | --s3-log-uri |
| s3_scratch_uri | (automatic) | combine_values() | --s3-scratch-uri |
| s3_sync_wait_time | 5.0 | combine_values() | --s3-sync-wait-time |
| ssh_bin | ssh | combine_cmds() | --ssh-bin |
| ssh_bind_ports | range(40001, 40841) | combine_values() | --ssh-bind-ports |
| ssh_tunnel_is_open | False | combine_values() | --ssh-tunnel-is-open, --ssh-tunnel-is-closed |
| ssh_tunnel_to_job_tracker | False | combine_values() | --ssh-tunnel-to-job-tracker |
See mrjob.emr.EMRJobRunner.__init__() for details.
| Option | Default | Combined by | Switches |
|---|---|---|---|
| hadoop_bin | (automatic) | combine_cmds() | --hadoop-bin |
| hadoop_home | HADOOP_HOME | combine_values() | (set HADOOP_HOME) |
| hdfs_scratch_dir | tmp/mrjob (in HDFS) | combine_paths() | --hdfs-scratch-dir |
See mrjob.hadoop.HadoopJobRunner.__init__() for details.