eleve.memory

Provide full-python reference implementation of eleve storage and Trie.

class eleve.memory.MemoryTrie(terminals=[])[source]

Bases: object

In-memory tree (made to be simple, no specific optimizations)

__init__(terminals=[])[source]

Constructor

Parameters:terminals – Tokens that are in “terminals” array are counted as distinct in the entropy computation. By default, the symbols are for start and end of sentences.
max_depth()[source]

Returns the maximum depth of the Trie

>>> trie = MemoryTrie()
>>> trie.max_depth()
0
>>> trie.add_ngram(["A", "B", "C"])
>>> trie.max_depth()
3
clear()[source]

Clear the trie.

iter_leafs()[source]
update_stats()[source]

Update the internal statistics (like entropy, and stdev & means) for the entropy variations.

Called automatically if the trie is modified and we then do queries on it.

add_ngram(ngram, freq=1)[source]

Add a ngram to the trie.

Parameters:
  • ngram – A list of tokens.
  • freq – specify the number of times you add (or substract) that ngram.
query_count(ngram)[source]

Query for the number of occurences we have seen the n-gram in the training data.

Parameters:ngram – A list of tokens.
Returns:An integer.
query_entropy(ngram)[source]

Query for the branching entropy.

Parameters:ngram – A list of tokens.
Returns:A float, that can be NaN if it is not defined.
query_ev(ngram)[source]

Query for the branching entropy variation.

Parameters:ngram – A list of tokens.
Returns:A float, that can be NaN if it is not defined.
query_autonomy(ngram, z_score=True)[source]

Query the autonomy (normalized entropy variation) for the n-gram.

Parameters:
  • ngram – A list of tokens.
  • z_score – If True, compute the z_score ((value - mean) / stdev). If False, just substract the mean.
Returns:

A float, that can be NaN if it is not defined.

class eleve.memory.MemoryStorage(default_ngram_length=5)[source]

Bases: object

Full-Python in-memory storage.

sentence_start = '\ue02b'
sentence_end = '\ue02d'
__init__(default_ngram_length=5)[source]

Storage constructor.

Parameters:default_ngram_length – the default maximum length of n-gram beeing stored. May be overriden in add_sentence().
default_ngram_length
add_sentence(sentence, freq=1, ngram_length=None)[source]

Add a sentence to the model.

Parameters:
  • sentence – The sentence to add. Should be a list of tokens.
  • freq – The number of times to add this sentence. One by default. May be negative to “remove” a sentence.
  • ngram_length – The length of n-grams that are stored. If None the default value setup in __init__ is used.
clear()[source]

Clear the training data in the model, effectively resetting it.

update_stats()[source]

Update the entropies and normalization factors. This function is called automatically when you modify the model and then query it.

query_autonomy(ngram)[source]

Query the autonomy for a ngram.

Parameters:ngram – A list of tokens.
Returns:A float, that can be NaN if it is not defined.
query_ev(ngram)[source]

Query the entropy variation for a ngram.

Parameters:ngram – A list of tokens.
Returns:A float, that can be NaN if it is not defined.
query_count(ngram)[source]

Query the count for a ngram (the number of time it appeared in the training corpus).

Parameters:ngram – A list of tokens.
Returns:A float.
query_entropy(ngram)[source]

Query the branching entropy for a n-gram.

Parameters:ngram – A list of tokens.
Returns:A float, that can be NaN if it is not defined.