This module implements a set of languages as collections of features that are language specific.
Languages implement a subset of feature collections (e.g. Dictionary, Stopwords, Stemmed and RegexMatches) based on what language assets are available. See revscoring.languages.features.
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “ar”. Provided by aspell-ar
RegexMatches features via a list of informal word detecting regexes.
Stopwords features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=15258449
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “nl”. Provided by myspell-nl
RegexMatches features via a list of informal word detecting regexes.
Stemmed word features via nltk.stem.snowball.SnowballStemmer “dutch”
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “en”. Provided by myspell-en-au, myspell-en-gb, myspell-en-us, and myspell-en-za.
RegexMatches features via a list of informal word detecting regexes.
Stemmed word features via nltk.stem.snowball.SnowballStemmer “english”
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “et”. Provided by myspell-et
RegexMatches features via a list of informal word detecting regexes.
Stopwords features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=13987775
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “fr”. Provided by myspell-fr
RegexMatches features via a list of informal word detecting regexes.
Stemmed word features via nltk.stem.snowball.SnowballStemmer “french”
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “de”. Provided by myspell-de-de, myspell-de-at, and myspell-de-ch.
RegexMatches features via a list of informal word detecting regexes.
Stemmed word features via nltk.stem.snowball.SnowballStemmer “german”
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “he”. Provided by myspell-he
RegexMatches features via a list of informal word detecting regexes.
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “id”. Provided by aspell-it
RegexMatches features via a list of informal word detecting regexes.
Stopwords features provided by https://code.google.com/p/stop-words/source/browse/trunk/stop-words/stop-words-collection-2014.02.24/stop-words/stop-words_indonesian_1_id.txt
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “it”. Provided by myspell-it
RegexMatches features via a list of informal word detecting regexes.
Stemmed word features via nltk.stem.snowball.SnowballStemmer “italian”
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “fa”. Provided by myspell-fa
RegexMatches features via a list of informal word detecting regexes.
Stopwords features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=13044766
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “pt”. Provided by myspell-pt
RegexMatches features via a list of informal word detecting regexes.
Stemmed word features via nltk.stem.snowball.SnowballStemmer “portuguese”
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “es”. Provided by myspell-es
RegexMatches features via a list of informal word detecting regexes.
Stemmed word features via nltk.stem.snowball.SnowballStemmer “spanish”
RegexMatches features via a list of badword detecting regexes.
RegexMatches features via a list of informal word detecting regexes.
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “uk”. Provided by myspell-uk
RegexMatches features via a list of informal word detecting regexes.
Stopwords features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=13877074
RegexMatches features via a list of badword detecting regexes.
Dictionary features via enchant.Dict “vi”. Provided by hunspell-vi.
RegexMatches features via a list of informal word detecting regexes.
Stopwords features copied from https://vi.wiktionary.org/wiki/Th%C3%A0nh_vi%C3%AAn:Laurent_Bouvier/Free_Vietnamese_Dictionary_Project_Vietnamese-Vietnamese#Allwiki_.28closed.29