Command-line utilities

This module implements a set of utilities for generating revert datasets from the command-line. When the mwreverts python package is installed, a mwreverts utility should be available from the command-line. Run mwreverts -h for more information:

mwreverts dump2reverts

$ mwpersistence dump2reverts -h

Extracts reverts from an XML dump.

Usage:
    dump2reverts (-h|--help)
    dump2reverts [<input-file>...] [--radius=<num>] [--use-sha1]
                 [--threads=<num>] [--output=<path>] [--compress=<type>]
                 [--verbose] [--debug]

Options:
    -h|--help           Print this documentation
    <input-file>        The path to file containing MediaWiki XML
                        [default: <stdin>]
    --radius=<revs>     The maximum number of revisions that a revert can
                        reference. [default: 15]
    --use-sha1          Use the sha1 field even if a text field is
                        available.
    --threads=<num>     If a collection of files are provided, how many
                        processor threads? [default: <cpu_count>]
    --output=<path>     Write output to a directory with one output file
                        per input path.  [default: <stdout>]
    --compress=<type>   If set, output written to the output-dir will be
                        compressed in this format. [default: bz2]
    --verbose           Print dots and stuff to stderr
    --debug             Print debug logs.

mwreverts revdocs2reverts

$ mwpersistence revdocs2reverts -h

Extracts reverts from a page-partitioned sequence of JSON revision
documents.

Usage:
    revdocs2reverts (-h|--help)
    revdocs2reverts [<input-file>...] [--radius=<revs>] [--use-sha1]
                    [--threads=<num>] [--output=<path>] [--compress=<type>]
                    [--verbose] [--debug]

Options:
    -h|--help           Print this documentation
    <input-file>        The path to file containing page-partitioned
                        JSON revision documents. [default: <stdin>]
    --radius=<revs>     The maximum number of revisions that a revert can
                        reference. [default: 15]
    --use-sha1          Use the sha1 field even if a text field is
                        available.
    --threads=<num>     If a collection of files are provided, how many
                        processor threads? [default: <cpu_count>]
    --output=<path>     Write output to a directory with one output file
                        per input path.  [default: <stdout>]
    --compress=<type>   If set, output written to the output-dir will be
                        compressed in this format. [default: bz2]
    --verbose           Print progress information to stderr.
    --debug             Print debug logs.