tipy.prdct.WeightNgramPredictor

init(self, config, contextMonitor, predictorName=None)
(Constructor)

source code

WeightNgramPredictor creator.

Parameters:

config (drvr.Configuration) - The config is used to retrieve the predictor settings from the config file.
contextMonitor (ContextMonitor) - The contextMonitor is needed because it allow the predictor to get the input buffers tokens.
predictorName (str) - The custom name of the configuration using this predictor.

Overrides: object.__init__

predict(self, maxPartialPredictionSize, stopList=`[]`)

source code

Predict the next word according to the current context.

Use the input buffers (thanks to contextMonitor) and the n-gram database to predict the most probable suggestions. A suggestion is a word which can:

Predict the end of the world. i.e. complete the actual partial word (the user has not finished to input the word, we try to predict the end of the word).
Predict the next word (the user has type a separator after a word, we try to predict the next word before he starts to type it).

In order to compute the suggestions, this method:

Retrieve the last n tokens from the left input buffer ; where n is the maximum n-grams size (max(n)) which is stored in the database.
Loop for each n-gram size from max(n) to 1:
- Find n-grams of current n-gram size in the database which match the last input tokens.
- Add each retrieved n-gram to the suggestion list if it is not already in it and if we have not reach the maximum number of suggestions yet.

Parameters:

maxPartialPredictionSize (int) - Maximum number of suggestion to compute. If this number is reached, the suggestions list is immediatly return. DatabaseConnector.ngram_table_tp() returns the records in descending order according to their number of occurences so the most probable suggestions will be added to the list first. This result in no suggestion quality loss, regardless of the desired number of suggestions.
stopList (list) - The stoplist is a list of undesirable words. Any suggestion which is in the stopList won't be added to the suggestions list.

Returns: Prediction

A list of every suggestions possible (limited to maxPartialPredictionSize).

Overrides: Predictor.predict

weight(self, prefixCompletionCandidates, tokens)

source code

Compute probability of suggestions and return the most probable ones.

The probability of a suggestion is based on its relative frequency toward the whole set of suggestions and the number of single tokens in the database.

Parameters:

prefixCompletionCandidates (list) - List of every suggestions returned by self.predict().
tokens (list) - The last input tokens.

Returns: Prediction

List of every "good enought" suggestions.

prefix_ngrams_with_input(self, change, ngramMap)

source code

Use the input left buffer to expand the n-gram map.

This method call cntxt.ContextMonitor.previous_tokens to get the tokens from the left input buffer that are just before the change and add them BEFORE the change n-grams generated by self.make_ngram_map.

For instance, if the current left input buffer is:

   "phone is on the white table "

And change is:

   "table"

Then, the n-gram map generated by self.make_ngram_map() will be:

   {("table"): 1}

The n-gram map contain a sinle n-gram of size 1. And so this method will add the tokens preceding the change in the left input buffer to form n-grams of size 2 and more (until it reaches self.maxN):

   {("the", "white", "table"): 1, ("white", "table"): 1, {"table"): 1}

Parameters:

change (str) - The part of the left input buffer which represent the last change.
ngramMap (dict) - Dictionary associating n-grams with their number of occurences, generated by self.make_ngram_map().

Returns: dict

The extanded n-grams dictionary.

push_ngrams_in_db(self, ngramMap)

source code

Update the database with the n-grams contained in the n-gram map.

Each n-gram of the n-gram map is pushed into the database with its number of occurences (count). If the n-gram is already in the database then its count (number of occurences) is updated. If the n-gram is not in the database then it is simply inserted in it.

Parameters:

ngramMap (dict) - Dictionary associating n-grams with their number of occurences, generated by self.make_ngram_map and modified by self.prefix_ngrams_with_input.

count(self, tokens, offset, n)

source code

Make an n-gram then retrieve and return its 'count' entry in the db.

Parameters:

tokens (list) - The tokens used to make the n-gram.
offset (int) - Offsset of the first token in the tokens.
n (int) - Size of the n-gram.

Class WeightNgramPredictor

init(self, config, contextMonitor, predictorName=None)
(Constructor)

init_database_connector(self)

predict(self, maxPartialPredictionSize, stopList=`[]`)

weight(self, prefixCompletionCandidates, tokens)

learn(self, change)

make_ngram_map(self, change)

prefix_ngrams_with_input(self, change, ngramMap)

push_ngrams_in_db(self, ngramMap)

count(self, tokens, offset, n)

Class WeightNgramPredictor

__init__(self, config, contextMonitor, predictorName=None) (Constructor)

init_database_connector(self)

predict(self, maxPartialPredictionSize, stopList=[])

weight(self, prefixCompletionCandidates, tokens)

learn(self, change)

make_ngram_map(self, change)

prefix_ngrams_with_input(self, change, ngramMap)

push_ngrams_in_db(self, ngramMap)

count(self, tokens, offset, n)

init(self, config, contextMonitor, predictorName=None)
(Constructor)

predict(self, maxPartialPredictionSize, stopList=`[]`)