MediaWiki Utilities is an open source (MIT Licensed) library developed by Aaron Halfaker for extracting and processing data from MediaWiki installations, slave databases and xml dumps.

Instal with pip: pip install mediawiki-utilities

Note: Use of this library requires Python 3 or later.


A simple datatype for handling MediaWiki’s various time formats.

Core modules


A set of utilities for interacting with MediaWiki’s web API.

  • Session – Constructs an API session with a MediaWiki installation. Contains convenience methods for accessing prop=revisions, list=usercontribs, meta=siteinfo, list=deletedrevs and list=recentchanges.

A set of utilities for interacting with MediaWiki’s database.

  • DB – Constructs a mysql database connector with convenience methods for accessing revision, archive, page, user, and recentchanges.

A set of utilities for processing MediaWiki’s XML database dumps quickly and without dealing with streaming XML.

  • map() – Applies a function to a set of dump files (Iterator) using multiprocessing and aggregates the output.
  • Iterator – Constructs an iterator over a standard XML dump. Dumps contain site_info and pages. Pages contain metadata and revisions. Revisions contain metadata and text. This is probably why you are here.



A set of utilities for tracking the persistence of content between revisions.

  • State – Constructs an object that represents the current content persistence state of a page. Reports useful details about the persistence of content when updated.

A set of utilities for performing revert detection

  • detect() – Detects reverts in a sequence of revision events.
  • Detector – Constructs an identity revert detector that can be updated manually over the history of a page.

A set of utilities for grouping revisions and other events into sessions

  • cluster() – Clusters a sequence of user actions into sessions.
  • Cache – Constructs a cache of recent user actions that can be updated manually in order to detect sessions.

A set of utilities for normalizing and parsing page titles

  • normalize() – Normalizes a page title.
  • Parser – Constructs a parser with a set of namespaces that can be used to parse and normalize page titles.


