`eleve.memory`¶

Provide full-python reference implementation of eleve storage and Trie.

class eleve.memory.MemoryTrie(terminals=[])[source]¶

Bases: object

In-memory tree (made to be simple, no specific optimizations)

__init__(terminals=[])[source]¶

Constructor

Parameters:	terminals – Tokens that are in “terminals” array are counted as distinct in the entropy computation. By default, the symbols are for start and end of sentences.

max_depth()[source]¶

Returns the maximum depth of the Trie

>>> trie = MemoryTrie()
>>> trie.max_depth()
0
>>> trie.add_ngram(["A", "B", "C"])
>>> trie.max_depth()
3

clear()[source]¶: Clear the trie.

iter_leafs()[source]¶

update_stats()[source]¶

Update the internal statistics (like entropy, and stdev & means) for the entropy variations.

Called automatically if the trie is modified and we then do queries on it.

add_ngram(ngram, freq=1)[source]¶

Add a ngram to the trie.

Parameters:	ngram – A list of tokens. freq – specify the number of times you add (or substract) that ngram.

query_count(ngram)[source]¶

Query for the number of occurences we have seen the n-gram in the training data.

Parameters:	ngram – A list of tokens.
Returns:	An integer.

query_entropy(ngram)[source]¶

Query for the branching entropy.

Parameters:	ngram – A list of tokens.
Returns:	A float, that can be NaN if it is not defined.

query_ev(ngram)[source]¶

Query for the branching entropy variation.

Parameters:	ngram – A list of tokens.
Returns:	A float, that can be NaN if it is not defined.

query_autonomy(ngram, z_score=True)[source]¶

Query the autonomy (normalized entropy variation) for the n-gram.

Parameters:	ngram – A list of tokens. z_score – If True, compute the z_score ((value - mean) / stdev). If False, just substract the mean.
Returns:	A float, that can be NaN if it is not defined.

class eleve.memory.MemoryStorage(default_ngram_length=5)[source]¶

Bases: object

Full-Python in-memory storage.

sentence_start = '\ue02b'¶

sentence_end = '\ue02d'¶

__init__(default_ngram_length=5)[source]¶

Storage constructor.

Parameters:	default_ngram_length – the default maximum length of n-gram beeing stored. May be overriden in `add_sentence()`.

default_ngram_length¶

add_sentence(sentence, freq=1, ngram_length=None)[source]¶

Add a sentence to the model.

Parameters:	sentence – The sentence to add. Should be a list of tokens. freq – The number of times to add this sentence. One by default. May be negative to “remove” a sentence. ngram_length – The length of n-grams that are stored. If None the default value setup in __init__ is used.

clear()[source]¶: Clear the training data in the model, effectively resetting it.

update_stats()[source]¶: Update the entropies and normalization factors. This function is called automatically when you modify the model and then query it.

query_autonomy(ngram)[source]¶

Query the autonomy for a ngram.

Parameters:	ngram – A list of tokens.
Returns:	A float, that can be NaN if it is not defined.

query_ev(ngram)[source]¶

Query the entropy variation for a ngram.

Parameters:	ngram – A list of tokens.
Returns:	A float, that can be NaN if it is not defined.

query_count(ngram)[source]¶

Query the count for a ngram (the number of time it appeared in the training corpus).

Parameters:	ngram – A list of tokens.
Returns:	A float.

query_entropy(ngram)[source]¶

Query the branching entropy for a n-gram.

Parameters:	ngram – A list of tokens.
Returns:	A float, that can be NaN if it is not defined.

eleve.memory¶

`eleve.memory`¶