Extractors¶

This module provides a set of wikiclass.Extractor s that implement a strategy for identifying article quality labeling events historically. These labelings are used as training data to build prediction models.

Supported wikis¶

wikiclass.extractors.enwiki¶

This extractor looks for instances of templates that contain “class=<some class>” on article talk pages (namespace = 1) and parses the template name to obtain a project.

wikiclass.extractors.itwiki¶

This extractor looks for instances of the “wikiprojet” template on article talk pages (namespace = 1) with a parameter called “avancement”. All `project`s are hard-coded to “wikiprojet”

Base classes¶

class wikiclass.Extractor(name, doc, namespaces)¶

Implements an labeling event extraction strategy.

Parameters:	name : str A name for the extraction strategy doc : str Documentation describing the extraction strategy namespace : iterable`(`int) A set of namespaces that will be considered when performin an extraction

extract(page, verbose=False)¶

Processes an mw.xml_dump.Page and returns a generator of first-observations of a project/label pair.

Parameters:	page : `mw.xml_dump.Page` Page to process verbose : bool print dots to stderr

class wikiclass.TemplateExtractor(*args, from_template, **kwargs)¶

Implements a template-based extraction strategy based on a from_template function that takes a template and returns a (project, label) pair.

Parameters:	from_template : func A function that takes a template and returns a (project, label) pair

extract(page, verbose=False)¶

Processes an mw.xml_dump.Page and returns a generator of first-observations of a project/label pair.

Parameters:	page : `mw.xml_dump.Page` Page to process verbose : bool print dots to stderr

extract_labels(text)¶

Extracts a set of labels for a version of text by parsing templates.

Parameters:	text : str Wikitext markup to extract labels from
Returns:	An iterator over (project, label) pairs

Extractors¶

Supported wikis¶

wikiclass.extractors.enwiki¶

wikiclass.extractors.itwiki¶

Base classes¶

Table Of Contents

Previous topic

This Page

Navigation

Extractors¶

Supported wikis¶

wikiclass.extractors.enwiki¶

wikiclass.extractors.itwiki¶

Base classes¶

Table Of Contents

Previous topic

This Page

Quick search

Navigation