revscoring.features.wikitext

This features module provides access to features of the bytes of content in revisions.

revscoring.features.wikitext.revision = {wikitext.revision}

Represents the base revision of interest. Implements this a basic structure:

Supporting classes

class revscoring.features.wikitext.Revision(*args, **kwargs)
Character features:
 
chars : int

The number of characters

whitespace_chars : int

The number of whitespace characters

markup_chars : int

The number of wikitext markup characters

cjk_chars : int

The number of Chinese/Japanese/Korean characters

entity_chars : int

The number of HTML entity characters

url_chars : int

The number of URL characters

word_chars : int

The number of word characters

uppercase_word_chars : int

The number of UPPERCASE WORD characters

punctuation_chars : int

The number of punctuation characters

break_chars : int

The number of break characters

longest_repeated_char : int

The length of the most longest character repetition

Tokenized features:
 
tokens : int

The number of tokens

numbers : int

The number of number tokens

whitespaces : int

The number of whitespace tokens

markups : int

The number of markup tokens

cjks : int

The number of Chinese/Japanese/Korean tokens

entities : int

The number of HTML entity tokens

urls : int

The number of URL tokens

words : int

The number of word tokens

uppercase_words : int

The number of UPPERCASE word tokens

punctuations : int

The number of punctuation tokens

breaks : int

The number of break tokens

longest_token : int

The length of the longest token

longest_word : int

The length of the longest word-token

Parsed features:
 
content_chars : int

The number of characters of viewable content (no markup or templates)

headings : int

The number of headings

external_links : int

The number of external links

wikilinks : int

The number of wikilinks (internal to other pages in the wiki)

tags : int

The number of HTML tags

ref_tags : int

The number of <ref> tags

templates : int

The number of templates

Constructs a revscoring.Datasource that generates a count of external link URLs that match a regular expression

heading_titles_matching(regex, name=None)

Constructs a revscoring.Feature that that generates a count of header titles that match a regular expression.

headings_by_level(level, name=None)

Constructs a revscoring.Datasource that generates a count of all headers of a level.

tag_names_matching(regex, name=None)

Constructs a revscoring.Datasource that generates a count of tag names that match a regular expression.

template_names_matching(regex, name=None)

Constructs a revscoring.Feature that generates a count of template names that match a regular expression.

Constructs a revscoring.Datasource that that generates a count of wikilink titles names that match a regular expression.

class revscoring.features.wikitext.Diff(*args, **kwargs)
Character features:
 
chars_added : int

The number of characters added

chars_removed : int

The number of characters removed

numeric_chars_added : int

The number of numeric characters added

numeric_chars_removed : int

The number of numeric characters removed

whitespace_chars_added : int

The number of whitespace characters added

whitespace_chars_removed : int

The number of whitespace characters removed

markup_chars_added : int

The number of markup characters added

markup_chars_removed : int

The number of markup characters removed

cjk_chars_added : int

The number of cjk characters added

cjk_chars_removed : int

The number of cjk characters removed

entity_chars_added : int

The number of entity characters added

entity_chars_removed : int

The number of entity characters removed

url_chars_added : int

The number of url characters added

url_chars_removed : int

The number of url characters removed

word_chars_added : int

The number of word characters added

word_chars_removed : int

The number of word characters removed

uppercase_word_chars_added : int

The number of UPPERCASE word characters added

uppercase_word_chars_removed : int

The number of UPPERCASE word characters removed

punctuation_chars_added : int

The number of punctuation characters added

punctuation_chars_removed : int

The number of punctuation characters removed

break_chars_added : int

The number of break characters added

break_chars_removed : int

The number of break characters removed

longest_repeated_char_added : int

The most repeated character added

Token frequency features:
 
token_delta_sum : int

The sum of delta changes in the token frequency table

token_delta_increase : int

The sum of delta increases in the token frequency table

token_delta_decrease : int

The sum of delta decreases in the token frequency table

token_prop_delta_sum : float

The sum of proportional delta changes in the token frequency table

token_prop_delta_increase : float

The sum of proportional delta increases in the token frequency table

token_prop_delta_decrease : float

The sum of proportional delta decreases in the token frequency table

number_delta_sum : int

The sum of delta changes in the number frequency table

number_delta_increase : int

The sum of delta increases in the number frequency table

number_delta_decrease : int

The sum of delta decreases in the number frequency table

number_prop_delta_sum : float

The sum of proportional delta changes in the number frequency table

number_prop_delta_increase : float

The sum of proportional delta increases in the number frequency table

number_prop_delta_decrease : float

The sum of proportional delta decreases in the number frequency table

whitespace_delta_sum : int

The sum of delta changes in the whitespace frequency table

whitespace_delta_increase : int

The sum of delta increases in the whitespace frequency table

whitespace_delta_decrease : int

The sum of delta decreases in the whitespace frequency table

whitespace_prop_delta_sum : float

The sum of proportional delta changes in the whitespace frequency table

whitespace_prop_delta_increase : float

The sum of proportional delta increases in the whitespace frequency table

whitespace_prop_delta_decrease : float

The sum of proportional delta decreases in the whitespace frequency table

markup_delta_sum : int

The sum of delta changes in the markup frequency table

markup_delta_increase : int

The sum of delta increases in the markup frequency table

markup_delta_decrease : int

The sum of delta decreases in the markup frequency table

markup_prop_delta_sum : float

The sum of proportional delta changes in the markup frequency table

markup_prop_delta_increase : float

The sum of proportional delta increases in the markup frequency table

markup_prop_delta_decrease : float

The sum of proportional delta decreases in the markup frequency table

cjk_delta_sum : int

The sum of delta changes in the cjk frequency table

cjk_delta_increase : int

The sum of delta increases in the cjk frequency table

cjk_delta_decrease : int

The sum of delta decreases in the cjk frequency table

cjk_prop_delta_sum : float

The sum of proportional delta changes in the cjk frequency table

cjk_prop_delta_increase : float

The sum of proportional delta increases in the cjk frequency table

cjk_prop_delta_decrease : float

The sum of proportional delta decreases in the cjk frequency table

entity_delta_sum : int

The sum of delta changes in the entity frequency table

entity_delta_increase : int

The sum of delta increases in the entity frequency table

entity_delta_decrease : int

The sum of delta decreases in the entity frequency table

entity_prop_delta_sum : float

The sum of proportional delta changes in the entity frequency table

entity_prop_delta_increase : float

The sum of proportional delta increases in the entity frequency table

entity_prop_delta_decrease : float

The sum of proportional delta decreases in the entity frequency table

url_delta_sum : int

The sum of delta changes in the url frequency table

url_delta_increase : int

The sum of delta increases in the url frequency table

url_delta_decrease : int

The sum of delta decreases in the url frequency table

url_prop_delta_sum : float

The sum of proportional delta changes in the url frequency table

url_prop_delta_increase : float

The sum of proportional delta increases in the url frequency table

url_prop_delta_decrease : float

The sum of proportional delta decreases in the url frequency table

word_delta_sum : int

The sum of delta changes in the word frequency table

word_delta_increase : int

The sum of delta increases in the word frequency table

word_delta_decrease : int

The sum of delta decreases in the word frequency table

word_prop_delta_sum : float

The sum of proportional delta changes in the word frequency table

word_prop_delta_increase : float

The sum of proportional delta increases in the word frequency table

word_prop_delta_decrease : float

The sum of proportional delta decreases in the word frequency table

uppercase_word_delta_sum : int

The sum of delta changes in the UPPERCASE word frequency table

uppercase_word_delta_increase : int

The sum of delta increases in the UPPERCASE word frequency table

uppercase_word_delta_decrease : int

The sum of delta decreases in the UPPERCASE word frequency table

uppercase_word_prop_delta_sum : float

The sum of proportional delta changes in the UPPERCASE word frequency table

uppercase_word_prop_delta_increase : float

The sum of proportional delta increases in the UPPERCASE word frequency table

uppercase_word_prop_delta_decrease : float

The sum of proportional delta decreases in the UPPERCASE word frequency table

punctuation_delta_sum : int

The sum of delta changes in the punctuation frequency table

punctuation_delta_increase : int

The sum of delta increases in the punctuation frequency table

punctuation_delta_decrease : int

The sum of delta decreases in the punctuation frequency table

punctuation_prop_delta_sum : float

The sum of proportional delta changes in the punctuation frequency table

punctuation_prop_delta_increase : float

The sum of proportional delta increases in the punctuation frequency table

punctuation_prop_delta_decrease : float

The sum of proportional delta decreases in the punctuation frequency table

break_delta_sum : int

The sum of delta changes in the break frequency table

break_delta_increase : int

The sum of delta increases in the break frequency table

break_delta_decrease : int

The sum of delta decreases in the break frequency table

break_prop_delta_sum : float

The sum of proportional delta changes in the break frequency table

break_prop_delta_increase : float

The sum of proportional delta increases in the break frequency table

break_prop_delta_decrease : float

The sum of proportional delta decreases in the break frequency table

Token edit features:
 
segments_added : int

The number of segments added

segments_removed : int

The number of segments removed

tokens_added : int

The number of tokens added

tokens_removed : int

The number of tokens removed

numbers_added : int

The number of number tokens added

numbers_removed : int

The number of number tokens removed

markups_added : int

The number of markup tokens added

markups_removed : int

The number of markup tokens removed

whitespaces_added : int

The number of whitespace tokens added

whitespaces_removed : int

The number of whitespace tokens removed

cjks_added : int

The number of cjk tokens added

cjks_removed : int

The number of cjk tokens removed

entities_added : int

The number of entity tokens added

entities_removed : int

The number of entity tokens removed

urls_added : int

The number of url tokens added

urls_removed : int

The number of url tokens removed

words_added : int

The number of word tokens added

words_removed : int

The number of word tokens removed

uppercase_words_added : int

The number of word tokens added

uppercase_words_removed : int

The number of word tokens removed

punctuations_added : int

The number of punctuation tokens added

punctuations_removed : int

The number of punctuation tokens removed

breaks_added : int

The number of break tokens added

breaks_removed : int

The number of break tokens removed

longest_token_added : int

The length of the longest token added”

longest_uppercase_word_added : int

The length of the longest sequence of UPPPERCASE characters added

Revision Scoring

Navigation

Related Topics