tipy :: prdct :: WeightNgramPredictor :: Class WeightNgramPredictor
[hide private]
[frames] | no frames]

Class WeightNgramPredictor

source code

object --+    
         |    
 Predictor --+
             |
            WeightNgramPredictor

Compute prediction from n-gram model in database.

Class Hierarchy for WeightNgramPredictor
Class Hierarchy for WeightNgramPredictor

Nested Classes [hide private]
    Inherited from Predictor
  __metaclass__
Metaclass for defining Abstract Base Classes (ABCs).
Instance Methods [hide private]
 
__init__(self, config, contextMonitor, predictorName=None)
WeightNgramPredictor creator.
source code
 
init_database_connector(self)
Initialize the database connector.
source code
Prediction
predict(self, maxPartialPredictionSize, stopList=[])
Predict the next word according to the current context.
source code
Prediction
weight(self, prefixCompletionCandidates, tokens)
Compute probability of suggestions and return the most probable ones.
source code
 
close_database(self)
Close the predictor's database.
source code
 
learn(self, change)
Learn what need to be learnt by adding n-grams in database.
source code
 
make_ngram_map(self, change)
Create a map associating n-grams (lists of words) and their count.
source code
dict
prefix_ngrams_with_input(self, change, ngramMap)
Use the input left buffer to expand the n-gram map.
source code
 
push_ngrams_in_db(self, ngramMap)
Update the database with the n-grams contained in the n-gram map.
source code
 
count(self, tokens, offset, n)
Make an n-gram then retrieve and return its 'count' entry in the db.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  __abstractmethods__ = frozenset([])
    Inherited from Predictor
  _abc_cache = <_weakrefset.WeakSet object at 0x7f2a42360550>
  _abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2a423...
  _abc_negative_cache_version = 39
  _abc_registry = <_weakrefset.WeakSet object at 0x7f2a42360490>
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, config, contextMonitor, predictorName=None)
(Constructor)

source code 

WeightNgramPredictor creator.

Parameters:
  • config (drvr.Configuration) - The config is used to retrieve the predictor settings from the config file.
  • contextMonitor (ContextMonitor) - The contextMonitor is needed because it allow the predictor to get the input buffers tokens.
  • predictorName (str) - The custom name of the configuration using this predictor.
Overrides: object.__init__

init_database_connector(self)

source code 

Initialize the database connector.

Using the database file path, the n-gram maximum size and the learn mode to initialize and open the database.

predict(self, maxPartialPredictionSize, stopList=[])

source code 

Predict the next word according to the current context.

Use the input buffers (thanks to contextMonitor) and the n-gram database to predict the most probable suggestions. A suggestion is a word which can:

  • Predict the end of the world. i.e. complete the actual partial word (the user has not finished to input the word, we try to predict the end of the word).
  • Predict the next word (the user has type a separator after a word, we try to predict the next word before he starts to type it).

In order to compute the suggestions, this method:

  • Retrieve the last n tokens from the left input buffer ; where n is the maximum n-grams size (max(n)) which is stored in the database.
  • Loop for each n-gram size from max(n) to 1:
    • Find n-grams of current n-gram size in the database which match the last input tokens.
    • Add each retrieved n-gram to the suggestion list if it is not already in it and if we have not reach the maximum number of suggestions yet.
Parameters:
  • maxPartialPredictionSize (int) - Maximum number of suggestion to compute. If this number is reached, the suggestions list is immediatly return. DatabaseConnector.ngram_table_tp() returns the records in descending order according to their number of occurences so the most probable suggestions will be added to the list first. This result in no suggestion quality loss, regardless of the desired number of suggestions.
  • stopList (list) - The stoplist is a list of undesirable words. Any suggestion which is in the stopList won't be added to the suggestions list.
Returns: Prediction
A list of every suggestions possible (limited to maxPartialPredictionSize).
Overrides: Predictor.predict

weight(self, prefixCompletionCandidates, tokens)

source code 

Compute probability of suggestions and return the most probable ones.

The probability of a suggestion is based on its relative frequency toward the whole set of suggestions and the number of single tokens in the database.

Parameters:
  • prefixCompletionCandidates (list) - List of every suggestions returned by self.predict().
  • tokens (list) - The last input tokens.
Returns: Prediction
List of every "good enought" suggestions.

learn(self, change)

source code 

Learn what need to be learnt by adding n-grams in database.

Parameters:
  • change (str) - The part of the left input buffer which represent the last change.
Overrides: Predictor.learn

make_ngram_map(self, change)

source code 

Create a map associating n-grams (lists of words) and their count.

Parameters:
  • change (str) - The part of the left input buffer which represent the last change.

prefix_ngrams_with_input(self, change, ngramMap)

source code 

Use the input left buffer to expand the n-gram map.

This method call cntxt.ContextMonitor.previous_tokens to get the tokens from the left input buffer that are just before the change and add them BEFORE the change n-grams generated by self.make_ngram_map.

For instance, if the current left input buffer is:

   "phone is on the white table "

And change is:

   "table"

Then, the n-gram map generated by self.make_ngram_map() will be:

   {("table"): 1}

The n-gram map contain a sinle n-gram of size 1. And so this method will add the tokens preceding the change in the left input buffer to form n-grams of size 2 and more (until it reaches self.maxN):

   {("the", "white", "table"): 1, ("white", "table"): 1, {"table"): 1}
Parameters:
  • change (str) - The part of the left input buffer which represent the last change.
  • ngramMap (dict) - Dictionary associating n-grams with their number of occurences, generated by self.make_ngram_map().
Returns: dict
The extanded n-grams dictionary.

push_ngrams_in_db(self, ngramMap)

source code 

Update the database with the n-grams contained in the n-gram map.

Each n-gram of the n-gram map is pushed into the database with its number of occurences (count). If the n-gram is already in the database then its count (number of occurences) is updated. If the n-gram is not in the database then it is simply inserted in it.

Parameters:

count(self, tokens, offset, n)

source code 

Make an n-gram then retrieve and return its 'count' entry in the db.

Parameters:
  • tokens (list) - The tokens used to make the n-gram.
  • offset (int) - Offsset of the first token in the tokens.
  • n (int) - Size of the n-gram.