revscoring.languages

This module implements a set of languages as collections of features that are language specific.

feature collections

Languages implement a subset of feature collections (e.g. Dictionary, Stopwords, Stemmed and RegexMatches) based on what language assets are available. See revscoring.languages.features.

dutch

revscoring.languages.arabic.badwords = {arabic.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.arabic.dictionary = {arabic.dictionary}

Dictionary features via enchant.Dict “ar”. Provided by aspell-ar

revscoring.languages.arabic.informals = {arabic.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.arabic.stopwords = {arabic.stopwords}

Stopwords features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=15258449

dutch

revscoring.languages.dutch.badwords = {dutch.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.dutch.dictionary = {dutch.dictionary}

Dictionary features via enchant.Dict “nl”. Provided by myspell-nl

revscoring.languages.dutch.informals = {dutch.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.dutch.stemmed = {dutch.stemmed}

Stemmed word features via nltk.stem.snowball.SnowballStemmer “dutch”

revscoring.languages.dutch.stopwords = {dutch.stopwords}

Stopwords features provided by nltk.corpus.stopwords() “dutch”

english

revscoring.languages.english.badwords = {english.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.english.dictionary = {english.dictionary}

Dictionary features via enchant.Dict “en”. Provided by myspell-en-au, myspell-en-gb, myspell-en-us, and myspell-en-za.

revscoring.languages.english.informals = {english.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.english.stemmed = {english.stemmed}

Stemmed word features via nltk.stem.snowball.SnowballStemmer “english”

revscoring.languages.english.stopwords = {english.stopwords}

Stopwords features provided by nltk.corpus.stopwords() “english”

estonian

revscoring.languages.estonian.badwords = {estonian.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.estonian.dictionary = {estonian.dictionary}

Dictionary features via enchant.Dict “et”. Provided by myspell-et

revscoring.languages.estonian.informals = {estonian.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.estonian.stopwords = {estonian.stopwords}

Stopwords features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=13987775

french

revscoring.languages.french.badwords = {french.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.french.dictionary = {french.dictionary}

Dictionary features via enchant.Dict “fr”. Provided by myspell-fr

revscoring.languages.french.informals = {french.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.french.stemmed = {french.stemmed}

Stemmed word features via nltk.stem.snowball.SnowballStemmer “french”

revscoring.languages.french.stopwords = {french.stopwords}

Stopwords features provided by nltk.corpus.stopwords() “french”

german

revscoring.languages.german.badwords = {german.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.german.dictionary = {german.dictionary}

Dictionary features via enchant.Dict “de”. Provided by myspell-de-de, myspell-de-at, and myspell-de-ch.

revscoring.languages.german.informals = {german.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.german.stemmed = {german.stemmed}

Stemmed word features via nltk.stem.snowball.SnowballStemmer “german”

revscoring.languages.german.stopwords = {german.stopwords}

Stopwords features provided by nltk.corpus.stopwords() “german”

hebrew

revscoring.languages.hebrew.badwords = {hebrew.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.hebrew.dictionary = {hebrew.dictionary}

Dictionary features via enchant.Dict “he”. Provided by myspell-he

revscoring.languages.hebrew.informals = {hebrew.informals}

RegexMatches features via a list of informal word detecting regexes.

indonesian

revscoring.languages.indonesian.badwords = {indonesian.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.indonesian.dictionary = {indonesian.dictionary}

Dictionary features via enchant.Dict “id”. Provided by aspell-it

revscoring.languages.indonesian.informals = {indonesian.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.indonesian.stopwords = {indonesian.stopwords}

Stopwords features provided by https://code.google.com/p/stop-words/source/browse/trunk/stop-words/stop-words-collection-2014.02.24/stop-words/stop-words_indonesian_1_id.txt

italian

revscoring.languages.italian.badwords = {italian.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.italian.dictionary = {italian.dictionary}

Dictionary features via enchant.Dict “it”. Provided by myspell-it

revscoring.languages.italian.informals = {italian.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.italian.stemmed = {italian.stemmed}

Stemmed word features via nltk.stem.snowball.SnowballStemmer “italian”

revscoring.languages.italian.stopwords = {italian.stopwords}

Stopwords features provided by nltk.corpus.stopwords() “italian”

persian

revscoring.languages.persian.badwords = {persian.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.persian.dictionary = {persian.dictionary}

Dictionary features via enchant.Dict “fa”. Provided by myspell-fa

revscoring.languages.persian.informals = {persian.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.persian.stopwords = {persian.stopwords}

Stopwords features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=13044766

portuguese

revscoring.languages.portuguese.badwords = {portuguese.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.portuguese.dictionary = {portuguese.dictionary}

Dictionary features via enchant.Dict “pt”. Provided by myspell-pt

revscoring.languages.portuguese.informals = {portuguese.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.portuguese.stemmed = {portuguese.stemmed}

Stemmed word features via nltk.stem.snowball.SnowballStemmer “portuguese”

revscoring.languages.portuguese.stopwords = {portuguese.stopwords}

Stopwords features provided by nltk.corpus.stopwords() “portuguese”

spanish

revscoring.languages.spanish.badwords = {spanish.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.spanish.dictionary = {spanish.dictionary}

Dictionary features via enchant.Dict “es”. Provided by myspell-es

revscoring.languages.spanish.informals = {spanish.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.spanish.stemmed = {spanish.stemmed}

Stemmed word features via nltk.stem.snowball.SnowballStemmer “spanish”

revscoring.languages.spanish.stopwords = {spanish.stopwords}

Stopwords features provided by nltk.corpus.stopwords() “spanish”

turkish

revscoring.languages.turkish.badwords = {turkish.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.turkish.informals = {turkish.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.turkish.stopwords = {turkish.stopwords}

Stopwords features provided by nltk.corpus.stopwords “turkish”

ukrainian

revscoring.languages.ukrainian.badwords = {ukrainian.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.ukrainian.dictionary = {ukrainian.dictionary}

Dictionary features via enchant.Dict “uk”. Provided by myspell-uk

revscoring.languages.ukrainian.informals = {ukrainian.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.ukrainian.stopwords = {ukrainian.stopwords}

Stopwords features copied from “common words” in https://meta.wikimedia.org/wiki/?oldid=13877074

vietnamese

revscoring.languages.vietnamese.badwords = {vietnamese.badwords}

RegexMatches features via a list of badword detecting regexes.

revscoring.languages.vietnamese.dictionary = {vietnamese.dictionary}

Dictionary features via enchant.Dict “vi”. Provided by hunspell-vi.

revscoring.languages.vietnamese.informals = {vietnamese.informals}

RegexMatches features via a list of informal word detecting regexes.

revscoring.languages.vietnamese.stopwords = {vietnamese.stopwords}

Stopwords features copied from https://vi.wiktionary.org/wiki/Th%C3%A0nh_vi%C3%AAn:Laurent_Bouvier/Free_Vietnamese_Dictionary_Project_Vietnamese-Vietnamese#Allwiki_.28closed.29

Revision Scoring

Navigation

Related Topics