State¶
-
class
mwpersistence.
DiffState
(diff_engine=None, revert_radius=None, revert_detector=None)[source]¶ Constructs a state object with a diff-based transition function.
Parameters: - diff_engine :
deltas.DiffEngine
A “diff engine” processor for sequentially diffing text
- revert_radius : int
a positive integer indicating the maximum revision distance that a revert can span.
- revert_detector :
mwreverts.Detector
A revert detector.
Example: >>> import mwpersistence >>> import deltas >>> >>> state = mwpersistence.DiffState(deltas.SegmentMatcher()) >>> >>> print(state.update("Apples are red.", revision=1)) ([Token(text='Apples', revisions=[1]), Token(text=' ', revisions=[1]), Token(text='are', revisions=[1]), Token(text=' ', revisions=[1]), Token(text='red', revisions=[1]), Token(text='.', revisions=[1])], [Token(text='Apples', revisions=[1]), Token(text=' ', revisions=[1]), Token(text='are', revisions=[1]), Token(text=' ', revisions=[1]), Token(text='red', revisions=[1]), Token(text='.', revisions=[1])], []) >>> print(state.update("Apples are blue.", revision=2)) ([Token(text='Apples', revisions=[1, 2]), Token(text=' ', revisions=[1, 2]), Token(text='are', revisions=[1, 2]), Token(text=' ', revisions=[1, 2]), Token(text='blue', revisions=[2]), Token(text='.', revisions=[1, 2])], [Token(text='blue', revisions=[2])], [Token(text='red', revisions=[1])]) >>> print(state.update("Apples are red.", revision=3)) # A revert! ([Token(text='Apples', revisions=[1, 2, 3]), Token(text=' ', revisions=[1, 2, 3]), Token(text='are', revisions=[1, 2, 3]), Token(text=' ', revisions=[1, 2, 3]), Token(text='red', revisions=[1, 3]), Token(text='.', revisions=[1, 2, 3])], [], [])
-
update
(text, revision=None)[source]¶ Modifies the internal state based a change to the content and returns the sets of words added and removed.
Parameters: - text : str
The text content of a revision
- revision : mixed
Revision metadata
Returns: A triple of lists:
- current_tokens : list (
Token
) A sequence of Tokens representing the revision that was just processed.
- tokens_added : list (
Token
) Tokens that were added while updating state.
- tokens_removed : list (
Token
) Tokens that were removed while updating state.
-
update_opdocs
(checksum, opdocs, revision=None)[source]¶ Modifies the internal state based a change to the content and returns the sets of words added and removed.
Parameters: - checksum : hashable
A checksum generated from the text of a revision
- opdocs : iterable ( dict )
A sequence of operations that represent the diff of this new revision
- revision : mixed
Revision metadata
Returns: A triple of lists:
- current_tokens : list (
Token
) A sequence of Tokens representing the revision that was just processed.
- tokens_added : list (
Token
) Tokens that were added while updating state.
- tokens_removed : list (
Token
) Tokens that were removed while updating state.
- diff_engine :