twistml.preprocessing package¶

Submodules¶

twistml.preprocessing.preprocessing module¶

<extended summary>

<routine listings>

<see also>

<notes>

<references>

<examples>

Author:	Matthias Manhertz
Copyright:	Matthias Manhertz 2015
Licence:	MIT

twistml.preprocessing.preprocessing.preprocess_tweets(tweets, remove_twitter=True, remove_stopwords=True, remove_nonenglish=True, perform_stemming=True)¶

Preprocess a list of tweets for processing in a machine learning task.

A series of preprocessing steps will be applied to the given tweets that are generally viewed as being beneficial for machine learning tasks. The steps include the removal of twitter specific tokens (like links and @-mentions), removal of stopwords, removal of non English words and stemming.

Each of these steps can be individually disabled by setting the corresponding parameter to False. Though not all combinations of enabled / disabled parameters have been tested and for some combinations the results may not be as desired.

tweets : list[dict[str,str]]: The tweets to be preprocessed.
remove_twitter : bool: If twitter specific tokens (links, @mentions) will be removed (default is True)
correct_spelling : bool: If spelling correction will be applied to the tweets (default is True)
remove_stopwords : bool: If stopwords will be removed from the tweets (default is True)
remove_nonenglish : bool: If non English words (names, uncorrected misspellings, ...) will be removed (default is True)
perform_stemming : bool: If stemming will be performed on the tweets (default is True)

tweets : list[dict[str,str]]: The preprocessed tweets

Module contents¶

<extended summary>

<module listings>

Author:	Matthias Manhertz
Copyright:	Matthias Manhertz 2015
Licence:	MIT

twistml.preprocessing package¶

Submodules¶

twistml.preprocessing.preprocessing module¶

Module contents¶

Table Of Contents

Related Topics

This Page