twistml.preprocessing package

Submodules

twistml.preprocessing.preprocessing module

<summary>

<extended summary>

<routine listings>

<see also>

<notes>

<references>

<examples>

Author:

Matthias Manhertz

Copyright:
  1. Matthias Manhertz 2015
Licence:

MIT

twistml.preprocessing.preprocessing.preprocess_tweets(tweets, remove_twitter=True, remove_stopwords=True, remove_nonenglish=True, perform_stemming=True)

Preprocess a list of tweets for processing in a machine learning task.

A series of preprocessing steps will be applied to the given tweets that are generally viewed as being beneficial for machine learning tasks. The steps include the removal of twitter specific tokens (like links and @-mentions), removal of stopwords, removal of non English words and stemming.

Each of these steps can be individually disabled by setting the corresponding parameter to False. Though not all combinations of enabled / disabled parameters have been tested and for some combinations the results may not be as desired.

tweets : list[dict[str,str]]
The tweets to be preprocessed.
remove_twitter : bool
If twitter specific tokens (links, @mentions) will be removed (default is True)
correct_spelling : bool
If spelling correction will be applied to the tweets (default is True)
remove_stopwords : bool
If stopwords will be removed from the tweets (default is True)
remove_nonenglish : bool
If non English words (names, uncorrected misspellings, ...) will be removed (default is True)
perform_stemming : bool
If stemming will be performed on the tweets (default is True)
tweets : list[dict[str,str]]
The preprocessed tweets

Module contents

<package summary>

<extended summary>

<module listings>

Author:

Matthias Manhertz

Copyright:
  1. Matthias Manhertz 2015
Licence:

MIT