Functions

This module implements a set of functions for extracting labeling events and predicting article quality scores.

wikclass.extract_features

wikiclass.extract_features(features, text, cache=None, context=None)

Extracts a set of feature values from a text.

Parameters:
features : list`( :class:`revscoring.features.Feature )

A list of features to extract values for

text : str

A text from which to extract features

Returns:

A list of extracted feature values

wikclass.extract_labelings

wikiclass.extract_labelings(dump, extractor=None, verbose=False)

Extracts labeling events from mwxml.Dump.

Parameters:
dump : mwxml.Dump

The XML dump file to extract labelings from

extractor : wikiclass.Extractor

An extractor to apply to the XML dump. If no extractor is provided, an extract will be looked up based on <dbname> in the XML dump’s <siteinfo> block.

verbose : bool

Print dots and stuff to stderr

Returns:

An iterator of dicts containing:

  • page_title – The normalized title of the article
  • project – A project (often a WikiProject) associated with the label
  • timestamp – The timestamp the labeling was observed
  • label – The quality label that was extracted

wikclass.extract_text

wikiclass.extract_text(dump, labelings, verbose=False)

Extracts article text and metadata for labelings from an XML dump.

Parameters:
dump : mwxml.Dump

The XML dump file to extract text & metadata from

labelings : iterable`(`dict)

A collection of labeling events to add text to

verbose : bool

Print dots and stuff

Returns:

An iterator of labelings augmented with ‘page_id’, ‘rev_id’ and ‘text’. Note that labelings of articles that can’t be looked up will not be included.

wikclass.fetch_text

wikiclass.fetch_text(session, labelings, verbose=False)

Fetches article text and metadata for labelings from a MediaWiki API.

Parameters:
session : mwapi.Session

An API session to use for querying

labelings : iterable`(`dict)

A collection of labeling events to add text to

verbose : bool

Print dots and stuff

Returns:

An iterator of labelings augmented with ‘page_id’, ‘rev_id’ and ‘text’. Note that labelings of articles that aren’t found will not be included.

wikclass.score

wikiclass.score(scorer_model, text, cache=None, context=None)

Scores a chunck of Wikitext markup

Parameters:
scorer_model : revscoring.ScorerModel

A scorer model to apply

text : str

A chunk of Wikitext markup to score

cache : dict

Cache to use during feature extraction

context : dict

Context injected during feature extraction

Returns:

A dict of score information.

Table Of Contents

Previous topic

Wikipedia article quality classification

Next topic

Utilities

This Page