This module implements a set of utilities for extracting features and train/testing revscoring.scorer_models.MLScorerModel from the command-line. When the revscoring python package is installed, a revscoring utility should be available from the commandline. Run revscoring -h for more information:
revscoring extract_features -h
Adds features to a set of labeled revisions.
Reads a TSV file of <rev_id> <label> pairs and replaces the
<rev_id> field with the extracted feature values.
Input: <rev_id>[TAB]<label>
Output: <feature_1>[TAB]<feature_2>[TAB]...[TAB]<label>
Usage:
extract_features -h | --help
extract_features <features> --host=<url> [--rev-labels=<path>]
[--value-labels=<path>]
[--include-revid]
[--extractors=<num>]
[--verbose] [--debug]
Options:
-h --help Print this documentation
<features> Classpath to a list/tuple of features
--host=<url> The url pointing to a MediaWiki API to use
for extracting features
--rev-labels=<path> Path to a file containing rev_id-label pairs
[default: <stdin>]
--value-labels=<path> Path to a file to write feature-labels to
[default: <stdout>]
--include-revid If set, include the revision ID as the first
column in the output TSV
--extractors=<num> The number of extractors to run in parallel
[default: <cpu count>]
--verbose Print dots and stuff
--debug Print debug logging
revscoring score -h
Scores a set of revisions.
Usage:
score (-h | --help)
score <model-file> <rev_id>... --host=<uri> [--verbose]
Options:
-h --help Print this documentation
<model-file> Path to a model file
--host=<url> The url pointing to a MediaWiki API to use for
extracting features
--verbose Print debugging info
<rev_id> A revision identifier
revscoring train_test -h
Trains and tests a scorer model. This utility expects to get a file of
tab-separated feature values and labels from which to construct a model.
Usage:
train_test -h | --help
train_test <scorer_model> <features> [-p=<kv>]... [-s=<kv>]...
[--version=<vers>]
[--values-labels=<path>]
[--model-file=<path>]
[--label-type=<type>]
[--test-prop=<prop>]
[--balance-sample-weight]
[--center]
[--scale]
[--debug]
Options:
-h --help Prints this documentation
<scorer_model> Classpath to the MLScorerModel to construct
and train
<features> Classpath to an list of features to use when
constructing the model
-p --parameter=<kv> A key-value argument pair to use when
constructing the scorer_model.
-s --statistic=<kv> A test statistic argument to use to evaluate
the scorer model against the test set.
--version=<vers> A version to associate with the model
--values-labels=<path> Path to a file containing feature values and
labels [default: <stdin>]
--model-file=<math> Path to write a model file to
[default: <stdout>]
--label-type=<type> Interprets the labels as the appropriate type
(int, float, str, bool) [default: str]
--test-prop=<prop> The proportion of data that should be withheld
for testing the model. [default: 0.20]
--balance-sample-weight Balance the weight of samples (increase
importance of under-represented classes)
--center Features should be centered on a common axis
--scale Features should be scaled to a common range
--debug Print debug logging.