This module implements a set of utilities for generating diffs and content persistence, statistics from the command-line. When the mwpersistence python package is installed, an mwpersistence utility should be available from the command-line. Run mwpersistence -h for more information:
$ mwdiffs dump2diffs -h
Computes diffs from an XML dump.
Usage:
dump2diffs (-h|--help)
dump2diffs [<input-file>...] --config=<path> [--namespaces=<ids>]
[--timeout=<secs>] [--keep-text] [--threads=<num>]
[--output=<path>] [--compress=<type>] [--verbose] [--debug]
Options:
-h|--help Print this documentation
<input-file> The path to a MediaWiki XML Dump file
[default: <stdin>]
--config=<path> The path to a deltas DiffEngine configuration
--namespaces=<ids> A comma separated list of namespace IDs to be
considered [default: <all>]
--timeout=<secs> The maximum number of seconds that a diff will be
able to run before being stopped [default: 10]
--keep-text If set, the 'text' field will not be dropped after
diffs are computed.
--threads=<num> If a collection of files are provided, how many
processor threads? [default: <cpu_count>]
--output=<path> Write output to a directory with one output file
per input path. [default: <stdout>]
--compress=<type> If set, output written to the output-dir will be
compressed in this format. [default: bz2]
--verbose Print progress information to stderr.
--debug Prints debug logs to stder.
$ mwdiffs revdocs2diffs -h
Computes diffs from a page-partitioned sequence of JSON revision documents.
Usage:
revdocs2diffs (-h|--help)
revdocs2diffs [<input-file>...] --config=<path> [--namespaces=<ids>]
[--timeout=<secs>] [--keep-text] [--threads=<num>]
[--output=<path>] [--compress=<type>] [--verbose]
[--debug]
Options:
-h|--help Print this documentation
<input-file> The path to file containing a page-partitioned
sequence of JSON revision documents
[default: <stdin>]
--config=<path> The path to a deltas DiffEngine configuration
--namespaces=<ids> A comma separated list of namespace IDs to be
considered [default: <all>]
--timeout=<secs> The maximum number of seconds that a diff will be
able to run before being stopped [default: 10]
--keep-text If set, the 'text' field will be populated in the
output JSON.
--threads=<num> If a collection of files are provided, how many
processor threads? [default: <cpu_count>]
--output=<path> Write output to a directory with one output file
per input path. [default: <stdout>]
--compress=<type> If set, output written to the output-dir will be
compressed in this format. [default: bz2]
--verbose Print progress information to stderr.
--debug Prints debug logs to stder.