revscoring.utilities

This module implements a set of utilities for extracting features and train/testing revscoring.scorer_models.MLScorerModel from the command-line. When the revscoring python package is installed, a revscoring utility should be available from the commandline. Run revscoring -h for more information:

extract_features

revscoring extract_features -h

Adds features to a set of labeled revisions.

Reads a TSV file of <rev_id>        <label> pairs and replaces the
<rev_id> field with the extracted feature values.

Input: <rev_id>[TAB]<label>

Output: <feature_1>[TAB]<feature_2>[TAB]...[TAB]<label>


Usage:
    extract_features -h | --help
    extract_features <features> --host=<url> [--rev-labels=<path>]
                                             [--value-labels=<path>]
                                             [--include-revid]
                                             [--extractors=<num>]
                                             [--verbose] [--debug]

Options:
    -h --help               Print this documentation
    <features>              Classpath to a list/tuple of features
    --host=<url>            The url pointing to a MediaWiki API to use
                            for extracting features
    --rev-labels=<path>     Path to a file containing rev_id-label pairs
                            [default: <stdin>]
    --value-labels=<path>   Path to a file to write feature-labels to
                            [default: <stdout>]
    --include-revid         If set, include the revision ID as the first
                            column in the output TSV
    --extractors=<num>      The number of extractors to run in parallel
                            [default: <cpu count>]
    --verbose               Print dots and stuff
    --debug                 Print debug logging

score

revscoring score -h

Scores a set of revisions.

Usage:
    score (-h | --help)
    score <model-file> <rev_id>... --host=<uri> [--verbose]

Options:
    -h --help      Print this documentation
    <model-file>   Path to a model file
    --host=<url>   The url pointing to a MediaWiki API to use for
                   extracting features
    --verbose      Print debugging info
    <rev_id>       A revision identifier

train_test

revscoring train_test -h

Trains and tests a scorer model.  This utility expects to get a file of
tab-separated feature values and labels from which to construct a model.

Usage:
    train_test -h | --help
    train_test <scorer_model> <features> [-p=<kv>]... [-s=<kv>]...
               [--version=<vers>]
               [--values-labels=<path>]
               [--model-file=<path>]
               [--label-type=<type>]
               [--test-prop=<prop>]
               [--balance-sample-weight]
               [--center]
               [--scale]
               [--debug]

Options:
    -h --help               Prints this documentation
    <scorer_model>          Classpath to the MLScorerModel to construct
                            and train
    <features>              Classpath to an list of features to use when
                            constructing the model
    -p --parameter=<kv>     A key-value argument pair to use when
                            constructing the scorer_model.
    -s --statistic=<kv>     A test statistic argument to use to evaluate
                            the scorer model against the test set.
    --version=<vers>        A version to associate with the model
    --values-labels=<path>  Path to a file containing feature values and
                            labels [default: <stdin>]
    --model-file=<math>     Path to write a model file to
                            [default: <stdout>]
    --label-type=<type>     Interprets the labels as the appropriate type
                            (int, float, str, bool) [default: str]
    --test-prop=<prop>      The proportion of data that should be withheld
                            for testing the model. [default: 0.20]
    --balance-sample-weight  Balance the weight of samples (increase
                             importance of under-represented classes)
    --center                 Features should be centered on a common axis
    --scale                  Features should be scaled to a common range
    --debug                 Print debug logging.

Revision Scoring

Navigation

Related Topics