mrjob.parse - log parsing

Utilities for parsing errors, counters, and status messages.

mrjob.parse.is_s3_uri(uri)

Return True if uri can be parsed into an S3 URI, False otherwise.

mrjob.parse.is_uri(uri)

Return True if uri is a URI and contains :// (we only care about URIs that can describe files)

Changed in version 0.5.7: used to recognize anything containing a colon as a URI unless it was a Windows path (C:\...).

mrjob.parse.is_windows_path(uri)

Return True if uri is a windows path.

Deprecated since version 0.5.7.

mrjob.parse.parse_key_value_list(kv_string_list, error_fmt, error_func)

Parse a list of strings like KEY=VALUE into a dictionary.

Parameters:
  • kv_string_list ([str]) – Parse a list of strings like KEY=VALUE into a dictionary.
  • error_fmt (str) – Format string accepting one %s argument which is the malformed (i.e. not KEY=VALUE) string
  • error_func (function(str)) – Function to call when a malformed string is encountered.

Deprecated since version 0.5.7.

mrjob.parse.parse_mr_job_stderr(stderr, counters=None)

Parse counters and status messages out of MRJob output.

Parameters:
  • stderr – a filehandle, a list of lines (bytes), or bytes
  • counters – Counters so far, to update; a map from group (string to counter name (string) to count.

Returns a dictionary with the keys counters, statuses, other:

  • counters: counters so far; same format as above
  • statuses: a list of status messages encountered
  • other: lines (strings) that aren’t either counters or status messages
mrjob.parse.parse_port_range_list(range_list_str)

Parse a port range list of the form (start[:end])(,(start[:end]))*

Deprecated since version 0.5.7.

mrjob.parse.parse_s3_uri(uri)

Parse an S3 URI into (bucket, key)

>>> parse_s3_uri('s3://walrus/tmp/')
('walrus', 'tmp/')

If uri is not an S3 URI, raise a ValueError