mw.lib.reverts – detecting reverts

This module provides a set of utilities for detecting identity reverts in revisioned content.

To detect reverts in a stream of revisions to a single page, you can use detect(). If you’ll be detecting reverts in a collection of pages or would, for some other reason, prefer to process revisions one at a time, Detector and it’s process() will allow you to do so.

To detect reverts one-at-time and arbitrarily, you can user the check() functions:

Note that these functions are less performant than detecting reverts in a stream of page revisions. This can be practical when trying to identify reverted revisions in a user’s contribution history.

mw.lib.reverts.detect(checksum_revisions, radius=15)

Detects reverts that occur in a sequence of revisions. Note that, revision data meta will simply be returned in the case of a revert.

This function serves as a convenience wrapper around calls to Detector‘s process() method.

Parameters:
checksum_revisions : iter( ( checksum : str, revision : mixed ) )

an iterable over tuples of checksum and revision meta data

radius : int

a positive integer indicating the maximum revision distance that a revert can span.

Return:

a iterator over Revert

Example:
>>> from mw.lib import reverts
>>>
>>> checksum_revisions = [
...     ("aaa", {'rev_id': 1}),
...     ("bbb", {'rev_id': 2}),
...     ("aaa", {'rev_id': 3}),
...     ("ccc", {'rev_id': 4})
... ]
>>>
>>> list(reverts.detect(checksum_revisions))
[Revert(reverting={'rev_id': 3}, reverteds=[{'rev_id': 2}], reverted_to={'rev_id': 1})]
class mw.lib.reverts.Revert

Represents a revert event. This class behaves like collections.namedtuple. Note that the datatypes of reverting, reverteds and reverted_to is not specified since those types will depend on the revision data provided during revert detection.

Members:
reverting

The reverting revision data : mixed

reverteds

The reverted revision data (ordered chronologically) : list( mixed )

reverted_to

The reverted-to revision data : mixed

class mw.lib.reverts.Detector(radius=15)

Detects revert events in a stream of revisions (to the same page) based on matching checksums. To detect reverts, construct an instance of this class and call process() in chronological order (direction == "newer").

See https://meta.wikimedia.org/wiki/R:Identity_revert

Parameters:
radius : int

a positive integer indicating the maximum revision distance that a revert can span.

Example:
>>> from mw.lib import reverts
>>> detector = reverts.Detector()
>>>
>>> detector.process("aaa", {'rev_id': 1})
>>> detector.process("bbb", {'rev_id': 2})
>>> detector.process("aaa", {'rev_id': 3})
Revert(reverting={'rev_id': 3}, reverteds=[{'rev_id': 2}], reverted_to={'rev_id': 1})
>>> detector.process("ccc", {'rev_id': 4})
process(checksum, revision=None)

Process a new revision and detect a revert if it occurred. Note that you can pass whatever you like as revision and it will be returned in the case that a revert occurs.

Parameters:
checksum : str

Any identity-machable string-based hash of revision content

revision : mixed

Revision meta data. Note that any data will just be returned in the case of a revert.

Returns:

a Revert if one occured or None

Convenience functions

mw.lib.reverts.api.check_rev(session, rev, **kwargs)[source]

Checks whether a revision (database row) was reverted (identity) and returns a named tuple of Revert(reverting, reverteds, reverted_to).

Parameters:
session : mw.api.Session

An API session to make use of

rev : dict

a revision dict containing ‘revid’ and ‘page.id’

radius : int

a positive integer indicating the maximum number of revisions that can be reverted

before : mw.Timestamp

if set, limits the search for reverting revisions to those which were saved before this timestamp

properties : set( str )

a set of properties to include in revisions (see mw.api.Revisions)

mw.lib.reverts.api.check(session, rev_id, page_id=None, radius=15, before=None, window=None, properties=None)[source]

Checks whether a revision was reverted (identity) and returns a named tuple of Revert(reverting, reverteds, reverted_to).

Parameters:
session : mw.api.Session

An API session to make use of

rev_id : int

the ID of the revision to check

page_id : int

the ID of the page the revision occupies (slower if not provided)

radius : int

a positive integer indicating the maximum number of revisions that can be reverted

before : mw.Timestamp

if set, limits the search for reverting revisions to those which were saved before this timestamp

window : int

if set, limits the search for reverting revisions to those which were saved within window seconds after the reverted edit

properties : set( str )

a set of properties to include in revisions (see mw.api.Revisions)

mw.lib.reverts.database.check_row(db, rev_row, **kwargs)[source]

Checks whether a revision (database row) was reverted (identity) and returns a named tuple of Revert(reverting, reverteds, reverted_to).

Parameters:
db : mw.database.DB

A database connection to make use of.

rev_row : dict

a revision row containing ‘rev_id’ and ‘rev_page’ or ‘page_id’

radius : int

a positive integer indicating the the maximum number of revisions that can be reverted

check_archive : bool

should the archive table be checked for reverting revisions?

before : Timestamp

if set, limits the search for reverting revisions to those which were saved before this timestamp

mw.lib.reverts.database.check(db, rev_id, page_id=None, radius=15, check_archive=False, before=None, window=None)[source]

Checks whether a revision was reverted (identity) and returns a named tuple of Revert(reverting, reverteds, reverted_to).

Parameters:
db : mw.database.DB

A database connection to make use of.

rev_id : int

the ID of the revision to check

page_id : int

the ID of the page the revision occupies (slower if not provided)

radius : int

a positive integer indicating the maximum number of revisions that can be reverted

check_archive : bool

should the archive table be checked for reverting revisions?

before : Timestamp

if set, limits the search for reverting revisions to those which were saved before this timestamp

window : int

if set, limits the search for reverting revisions to those which were saved within window seconds after the reverted edit

Constants

mw.lib.reverts.defaults.RADIUS = 15

TODO: Better documentation here. For the time being, see:

Priedhorsky, R., Chen, J., Lam, S. T. K., Panciera, K., Terveen, L., & Riedl, J. (2007, November). Creating, destroying, and restoring value in Wikipedia. In Proceedings of the 2007 international ACM conference on Supporting group work (pp. 259-268). ACM.

Table Of Contents

Previous topic

mw.lib.persistence – tracking content between revisions

Next topic

mw.lib.sessions – event clustering

This Page