State

class mwpersistence.DiffState(diff_engine=None, revert_radius=None, revert_detector=None)[source]

Constructs a state object with a diff-based transition function.

Parameters:
diff_engine : deltas.DiffEngine

A “diff engine” processor for sequentially diffing text

revert_radius : int

a positive integer indicating the maximum revision distance that a revert can span.

revert_detector : mwreverts.Detector

A revert detector.

Example:
>>> import mwpersistence
>>> import deltas
>>>
>>> state = mwpersistence.DiffState(deltas.SegmentMatcher())
>>>
>>> print(state.update("Apples are red.", revision=1))
([Token(text='Apples', revisions=[1]),
  Token(text=' ', revisions=[1]),
  Token(text='are', revisions=[1]),
  Token(text=' ', revisions=[1]),
  Token(text='red', revisions=[1]),
  Token(text='.', revisions=[1])],
 [Token(text='Apples', revisions=[1]),
  Token(text=' ', revisions=[1]),
  Token(text='are', revisions=[1]),
  Token(text=' ', revisions=[1]),
  Token(text='red', revisions=[1]),
  Token(text='.', revisions=[1])],
 [])
>>> print(state.update("Apples are blue.", revision=2))
([Token(text='Apples', revisions=[1, 2]),
  Token(text=' ', revisions=[1, 2]),
  Token(text='are', revisions=[1, 2]),
  Token(text=' ', revisions=[1, 2]),
  Token(text='blue', revisions=[2]),
  Token(text='.', revisions=[1, 2])],
 [Token(text='blue', revisions=[2])],
 [Token(text='red', revisions=[1])])
>>> print(state.update("Apples are red.", revision=3)) # A revert!
([Token(text='Apples', revisions=[1, 2, 3]),
  Token(text=' ', revisions=[1, 2, 3]),
  Token(text='are', revisions=[1, 2, 3]),
  Token(text=' ', revisions=[1, 2, 3]),
  Token(text='red', revisions=[1, 3]),
  Token(text='.', revisions=[1, 2, 3])],
 [],
 [])
update(text, revision=None)[source]

Modifies the internal state based a change to the content and returns the sets of words added and removed.

Parameters:
text : str

The text content of a revision

revision : mixed

Revision metadata

Returns:

A triple of lists:

current_tokens : list ( Token )

A sequence of Tokens representing the revision that was just processed.

tokens_added : list ( Token )

Tokens that were added while updating state.

tokens_removed : list ( Token )

Tokens that were removed while updating state.

update_opdocs(checksum, opdocs, revision=None)[source]

Modifies the internal state based a change to the content and returns the sets of words added and removed.

Parameters:
checksum : hashable

A checksum generated from the text of a revision

opdocs : iterable ( dict )

A sequence of operations that represent the diff of this new revision

revision : mixed

Revision metadata

Returns:

A triple of lists:

current_tokens : list ( Token )

A sequence of Tokens representing the revision that was just processed.

tokens_added : list ( Token )

Tokens that were added while updating state.

tokens_removed : list ( Token )

Tokens that were removed while updating state.

Abstract base

class mwpersistence.State[source]

Constructs a revision state object that will track the persistence of tokens though a history of revisions of word persistence. This class is commonly used to process the revisions of a page in chronological order.