Command-line utilities¶
This module implements a set of utilities for generating revert datasets from the command-line. When the mwreverts python package is installed, a mwreverts utility should be available from the command-line. Run mwreverts -h for more information:
mwreverts dump2reverts¶
$ mwpersistence dump2reverts -h
Extracts reverts from an XML dump.
Usage:
dump2reverts (-h|--help)
dump2reverts [<input-file>...] [--radius=<num>] [--use-sha1]
[--threads=<num>] [--output=<path>] [--compress=<type>]
[--verbose] [--debug]
Options:
-h|--help Print this documentation
<input-file> The path to file containing MediaWiki XML
[default: <stdin>]
--radius=<revs> The maximum number of revisions that a revert can
reference. [default: 15]
--use-sha1 Use the sha1 field even if a text field is
available.
--threads=<num> If a collection of files are provided, how many
processor threads? [default: <cpu_count>]
--output=<path> Write output to a directory with one output file
per input path. [default: <stdout>]
--compress=<type> If set, output written to the output-dir will be
compressed in this format. [default: bz2]
--verbose Print dots and stuff to stderr
--debug Print debug logs.
mwreverts revdocs2reverts¶
$ mwpersistence revdocs2reverts -h
Extracts reverts from a page-partitioned sequence of JSON revision
documents.
Usage:
revdocs2reverts (-h|--help)
revdocs2reverts [<input-file>...] [--radius=<revs>] [--use-sha1]
[--threads=<num>] [--output=<path>] [--compress=<type>]
[--verbose] [--debug]
Options:
-h|--help Print this documentation
<input-file> The path to file containing page-partitioned
JSON revision documents. [default: <stdin>]
--radius=<revs> The maximum number of revisions that a revert can
reference. [default: 15]
--use-sha1 Use the sha1 field even if a text field is
available.
--threads=<num> If a collection of files are provided, how many
processor threads? [default: <cpu_count>]
--output=<path> Write output to a directory with one output file
per input path. [default: <stdout>]
--compress=<type> If set, output written to the output-dir will be
compressed in this format. [default: bz2]
--verbose Print progress information to stderr.
--debug Print debug logs.