eleve.memory
¶
Provide full-python reference implementation of eleve
storage and Trie.
-
class
eleve.memory.
MemoryTrie
(terminals=[])[source]¶ Bases:
object
In-memory tree (made to be simple, no specific optimizations)
-
__init__
(terminals=[])[source]¶ Constructor
Parameters: terminals – Tokens that are in “terminals” array are counted as distinct in the entropy computation. By default, the symbols are for start and end of sentences.
-
max_depth
()[source]¶ Returns the maximum depth of the Trie
>>> trie = MemoryTrie() >>> trie.max_depth() 0 >>> trie.add_ngram(["A", "B", "C"]) >>> trie.max_depth() 3
-
update_stats
()[source]¶ Update the internal statistics (like entropy, and stdev & means) for the entropy variations.
Called automatically if the trie is modified and we then do queries on it.
-
add_ngram
(ngram, freq=1)[source]¶ Add a ngram to the trie.
Parameters: - ngram – A list of tokens.
- freq – specify the number of times you add (or substract) that ngram.
-
query_count
(ngram)[source]¶ Query for the number of occurences we have seen the n-gram in the training data.
Parameters: ngram – A list of tokens. Returns: An integer.
-
query_entropy
(ngram)[source]¶ Query for the branching entropy.
Parameters: ngram – A list of tokens. Returns: A float, that can be NaN if it is not defined.
-
query_ev
(ngram)[source]¶ Query for the branching entropy variation.
Parameters: ngram – A list of tokens. Returns: A float, that can be NaN if it is not defined.
-
query_autonomy
(ngram, z_score=True)[source]¶ Query the autonomy (normalized entropy variation) for the n-gram.
Parameters: - ngram – A list of tokens.
- z_score – If True, compute the z_score ((value - mean) / stdev). If False, just substract the mean.
Returns: A float, that can be NaN if it is not defined.
-
-
class
eleve.memory.
MemoryStorage
(default_ngram_length=5)[source]¶ Bases:
object
Full-Python in-memory storage.
-
sentence_start
= '\ue02b'¶
-
sentence_end
= '\ue02d'¶
-
__init__
(default_ngram_length=5)[source]¶ Storage constructor.
Parameters: default_ngram_length – the default maximum length of n-gram beeing stored. May be overriden in add_sentence()
.
-
default_ngram_length
¶
-
add_sentence
(sentence, freq=1, ngram_length=None)[source]¶ Add a sentence to the model.
Parameters: - sentence – The sentence to add. Should be a list of tokens.
- freq – The number of times to add this sentence. One by default. May be negative to “remove” a sentence.
- ngram_length – The length of n-grams that are stored. If None the default value setup in __init__ is used.
-
update_stats
()[source]¶ Update the entropies and normalization factors. This function is called automatically when you modify the model and then query it.
-
query_autonomy
(ngram)[source]¶ Query the autonomy for a ngram.
Parameters: ngram – A list of tokens. Returns: A float, that can be NaN if it is not defined.
-
query_ev
(ngram)[source]¶ Query the entropy variation for a ngram.
Parameters: ngram – A list of tokens. Returns: A float, that can be NaN if it is not defined.
-