mw.lib.reverts – detecting reverts
This module provides a set of utilities for detecting identity reverts in
revisioned content.
To detect reverts in a stream of revisions to a single page, you can use
detect(). If you’ll be detecting reverts in a collection of pages or
would, for some other reason, prefer to process revisions one at a time,
Detector and it’s process() will allow you to do so.
To detect reverts one-at-time and arbitrarily, you can user the check()
functions:
Note that these functions are less performant than detecting reverts in a
stream of page revisions. This can be practical when trying to identify
reverted revisions in a user’s contribution history.
-
mw.lib.reverts.detect(checksum_revisions, radius=15)
Detects reverts that occur in a sequence of revisions. Note that,
revision data meta will simply be returned in the case of a revert.
This function serves as a convenience wrapper around calls to
Detector‘s process()
method.
Parameters: |
- checksum_revisions : iter( ( checksum : str, revision : mixed ) )
an iterable over tuples of checksum and revision meta data
- radius : int
a positive integer indicating the maximum revision distance that a revert can span.
|
Return: | a iterator over Revert
|
Example: | >>> from mw.lib import reverts
>>>
>>> checksum_revisions = [
... ("aaa", {'rev_id': 1}),
... ("bbb", {'rev_id': 2}),
... ("aaa", {'rev_id': 3}),
... ("ccc", {'rev_id': 4})
... ]
>>>
>>> list(reverts.detect(checksum_revisions))
[Revert(reverting={'rev_id': 3}, reverteds=[{'rev_id': 2}], reverted_to={'rev_id': 1})]
|
-
class mw.lib.reverts.Revert
Represents a revert event. This class behaves like
collections.namedtuple. Note that the datatypes of reverting,
reverteds and reverted_to is not specified since those types will depend
on the revision data provided during revert detection.
Members: |
- reverting
The reverting revision data : mixed
- reverteds
The reverted revision data (ordered chronologically) : list( mixed )
- reverted_to
The reverted-to revision data : mixed
|
-
class mw.lib.reverts.Detector(radius=15)
Detects revert events in a stream of revisions (to the same page) based on
matching checksums. To detect reverts, construct an instance of this class and call
process() in chronological order (direction == "newer").
See https://meta.wikimedia.org/wiki/R:Identity_revert
Parameters: |
- radius : int
a positive integer indicating the maximum revision distance that a revert can span.
|
Example: | >>> from mw.lib import reverts
>>> detector = reverts.Detector()
>>>
>>> detector.process("aaa", {'rev_id': 1})
>>> detector.process("bbb", {'rev_id': 2})
>>> detector.process("aaa", {'rev_id': 3})
Revert(reverting={'rev_id': 3}, reverteds=[{'rev_id': 2}], reverted_to={'rev_id': 1})
>>> detector.process("ccc", {'rev_id': 4})
|
-
process(checksum, revision=None)
Process a new revision and detect a revert if it occurred. Note that
you can pass whatever you like as revision and it will be returned in
the case that a revert occurs.
Parameters: |
- checksum : str
Any identity-machable string-based hash of revision content
- revision : mixed
Revision meta data. Note that any data will just be returned in the
case of a revert.
|
Returns: | a Revert if one occured or None
|
Convenience functions
-
mw.lib.reverts.api.check_rev(session, rev, **kwargs)[source]
Checks whether a revision (database row) was reverted (identity) and returns
a named tuple of Revert(reverting, reverteds, reverted_to).
Parameters: |
- session : mw.api.Session
An API session to make use of
- rev : dict
a revision dict containing ‘revid’ and ‘page.id’
- radius : int
a positive integer indicating the maximum number of revisions that can be reverted
- before : mw.Timestamp
if set, limits the search for reverting revisions to those which were saved before this timestamp
- properties : set( str )
a set of properties to include in revisions (see mw.api.Revisions)
|
-
mw.lib.reverts.api.check(session, rev_id, page_id=None, radius=15, before=None, window=None, properties=None)[source]
Checks whether a revision was reverted (identity) and returns a named tuple
of Revert(reverting, reverteds, reverted_to).
Parameters: |
- session : mw.api.Session
An API session to make use of
- rev_id : int
the ID of the revision to check
- page_id : int
the ID of the page the revision occupies (slower if not provided)
- radius : int
a positive integer indicating the maximum number of revisions
that can be reverted
- before : mw.Timestamp
if set, limits the search for reverting revisions to those which
were saved before this timestamp
- window : int
if set, limits the search for reverting revisions to those which
were saved within window seconds after the reverted edit
- properties : set( str )
a set of properties to include in revisions (see mw.api.Revisions)
|
-
mw.lib.reverts.database.check_row(db, rev_row, **kwargs)[source]
Checks whether a revision (database row) was reverted (identity) and returns
a named tuple of Revert(reverting, reverteds, reverted_to).
Parameters: |
- db : mw.database.DB
A database connection to make use of.
- rev_row : dict
a revision row containing ‘rev_id’ and ‘rev_page’ or ‘page_id’
- radius : int
a positive integer indicating the the maximum number of revisions that can be reverted
- check_archive : bool
should the archive table be checked for reverting revisions?
- before : Timestamp
if set, limits the search for reverting revisions to those which were saved before this timestamp
|
-
mw.lib.reverts.database.check(db, rev_id, page_id=None, radius=15, check_archive=False, before=None, window=None)[source]
Checks whether a revision was reverted (identity) and returns a named tuple
of Revert(reverting, reverteds, reverted_to).
Parameters: |
- db : mw.database.DB
A database connection to make use of.
- rev_id : int
the ID of the revision to check
- page_id : int
the ID of the page the revision occupies (slower if not provided)
- radius : int
a positive integer indicating the maximum number of revisions that can be reverted
- check_archive : bool
should the archive table be checked for reverting revisions?
- before : Timestamp
if set, limits the search for reverting revisions to those which were saved before this timestamp
- window : int
if set, limits the search for reverting revisions to those which
were saved within window seconds after the reverted edit
|
Constants
-
mw.lib.reverts.defaults.RADIUS = 15
TODO: Better documentation here. For the time being, see:
Priedhorsky, R., Chen, J., Lam, S. T. K., Panciera, K., Terveen, L., &
Riedl, J. (2007, November). Creating, destroying, and restoring value in
Wikipedia. In Proceedings of the 2007 international ACM conference on
Supporting group work (pp. 259-268). ACM.