tipy :: prdct :: DictionaryPredictor :: Class DictionaryPredictor
[hide private]
[frames] | no frames]

Class DictionaryPredictor

source code

object --+    
         |    
 Predictor --+
             |
            DictionaryPredictor

Very simple word predictor using a dictionary.

The dictionary is a file containing one word per line. This predictor does not use n-grams and is therefore less effective than the predictors using n-grams because it does not consider context.

Class Hierarchy for DictionaryPredictor
Class Hierarchy for DictionaryPredictor

Nested Classes [hide private]
    Inherited from Predictor
  __metaclass__
Metaclass for defining Abstract Base Classes (ABCs).
Instance Methods [hide private]
 
__init__(self, config, contextMonitor, predictorName)
DictionaryPredictor creator.
source code
 
init_database_connector(self)
Initialize the database connector.
source code
 
get_dict_range(self, prefix)
Select the dictionary range where words starts with the given prefix.
source code
Prediction
predict(self, maxPartialPredictionSize, stopList)
Complete the actual word or predict the next word using dictionary.
source code
 
learn(self, text)
This predictor has no ability to learn.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  __abstractmethods__ = frozenset([])
    Inherited from Predictor
  _abc_cache = <_weakrefset.WeakSet object at 0x7f2a42360550>
  _abc_negative_cache = <_weakrefset.WeakSet object at 0x7f2a423...
  _abc_negative_cache_version = 39
  _abc_registry = <_weakrefset.WeakSet object at 0x7f2a42360490>
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, config, contextMonitor, predictorName)
(Constructor)

source code 

DictionaryPredictor creator.

Parameters:
  • config (drvr.Configuration) - The config is used to retrieve the predictor settings from the config file.
  • contextMonitor (ContextMonitor) - The contextMonitor is needed because it allow the predictor to get the input buffers tokens.
  • predictorName (str) - The custom name of the configuration using this predictor.
Overrides: object.__init__

Note: The string.lower() and string.strip() methods have a great impact on performance (the profile module show that they require almost 1 second of processing time when calculating suggestions for 10 contexts. So this constructor no more directly use the dictionary file. A database is created instead. Every words of the dictionary are lowered and stripped then added to the database. Doing so, the performance of the predictor are way better. Profiling a script querying suggestions for 10 successive contexts show the improvement profits:

  • lower()ing and strip()ping each word of the file on each predict() call:
       ncalls  tottime  percall  cumtime  percall filename:lineno
       690048    0.468    0.000    0.468    0.000 :0(lower)
    
  • Creating an improved list upon initialization and using it on each predict() call (previous optimization method):
       ncalls  tottime  percall  cumtime  percall filename:lineno
       100046    0.059    0.000    0.059    0.000 :0(lower)
    

    It is approx. 800% faster. But this profiling mix initialization and later computation. It means than most of the time of the previous profiling line is spend in initializing the list, computation on each predict() call are even more profitable.

  • Creating a database and querying it on each predict() call:
         ncalls  tottime  percall  cumtime  percall filename:lineno
         100046    0.059    0.000    0.059    0.000 :0(lower)
     It is not faster than the previous method but the database
     must only be created once. And once it is created the
     initialization time is (near) null and the querying time on
     each predict() call is even faster.
    

Change Log:

  • 08/06/15: Method now create an ordered optimized list containing dictionary words upon initialization in order to increase the speed of the predictor.
  • 13/06/15: Method now use a database containing the dictionary words. See: minr.DictMiner

init_database_connector(self)

source code 

Initialize the database connector.

Using the database file path, the n-gram maximum size and the learn mode to initialize and open the database.

get_dict_range(self, prefix)

source code 

Select the dictionary range where words starts with the given prefix.

A suggested word must complete the given token, it means that suggested words all start with this token, here called the prefix. This method create a list containing the suggested words for the given prefix, i.e. every words of the dictionary list starting with the prefix. It is easy as the dictionary list is ordered. For instance:

If the prefix is:

   'hell'

And the dictionary list is:

   ['bird', 'blue', 'given', 'hair', 'hellish', 'hello', 'red', 'zip']

We first remove every words of the list one by one until we reach a word which actualy starts with the prefix 'hell', then we have:

   ['hellish', 'hello', 'red', 'zip']

Finaly we scan every words of the remaining list and when we reach a word which does not starts with the given prefix then we know that every remaining words won't start with the prefix neither as the list is ordered, so we have:

   ['hellish', 'hello']
Parameters:
  • prefix (str) - The prefix from which suggested words range is computed.

Deprecated: This method has become useless since the words are now stored in a database.

predict(self, maxPartialPredictionSize, stopList)

source code 

Complete the actual word or predict the next word using dictionary.

Use the input buffers (thanks to contextMonitor) and the word dictionary to predict the most probable suggestions. A suggestion is a word which can:

  • Predict the end of the world. i.e. complete the actual partial word (the user has not finished to input the word, we try to predict the end of the word).
  • Predict the next word (the user has type a separator after a word, we try to predict the next word before he starts to type it).

In order to compute the suggestions, this method:

  • Retrieve the last token from the left input buffer.
  • Loop for each word in the dictionary:
    • If the word starts with the last token retrieved: add it to the suggestion list if we have not reach the maximum number of suggestions yet. It is not necessary to check if the word is already in the suggestion list because in a dictionary a word should only appear once. In any case, the merger will merge the duplicate suggestions.
Parameters:
  • maxPartialPredictionSize (int) - Maximum number of suggestion to compute. If this number is reached, the suggestions list is immediatly return. DatabaseConnector.ngram_table_tp() returns the records in descending order according to their number of occurences so the most probable suggestions will be added to the list first. This result in no suggestion quality loss, regardless of the desired number of suggestions.
  • stopList (list) - The stoplist is a list of undesirable words. Any suggestion which is in the stopList won't be added to the suggestions list.
Returns: Prediction
A list of every suggestions possible (limited to maxPartialPredictionSize).
Overrides: Predictor.predict

learn(self, text)

source code 

This predictor has no ability to learn.

Overrides: Predictor.learn