MediaWiki Utilities

MediaWiki Utilities is an open source (MIT Licensed) library developed by Aaron Halfaker for extracting and processing data from MediaWiki installations, slave databases and xml dumps.

Instal with pip: pip install mediawiki-utilities

Note: Use of this library requires Python 3 or later.

Types

mw.Timestamp
A simple datatype for handling MediaWiki’s various time formats.

Core modules

mw.api

A set of utilities for interacting with MediaWiki’s web API.

  • Session – Constructs an API session with a MediaWiki installation. Contains convenience methods for accessing prop=revisions, list=usercontribs, meta=siteinfo, list=deletedrevs and list=recentchanges.
mw.database

A set of utilities for interacting with MediaWiki’s database.

  • DB – Constructs a mysql database connector with convenience methods for accessing revision, archive, page, user, and recentchanges.
mw.xml_dump

A set of utilities for processing MediaWiki’s XML database dumps quickly and without dealing with streaming XML.

  • map() – Applies a function to a set of dump files (Iterator) using multiprocessing and aggregates the output.
  • Iterator – Constructs an iterator over a standard XML dump. Dumps contain site_info and pages. Pages contain metadata and revisions. Revisions contain metadata and text. This is probably why you are here.

Libraries

mw.lib.persistence

A set of utilities for tracking the persistence of content between revisions.

  • State – Constructs an object that represents the current content persistence state of a page. Reports useful details about the persistence of content when updated.
mw.lib.reverts

A set of utilities for performing revert detection

  • detect() – Detects reverts in a sequence of revision events.
  • Detector – Constructs an identity revert detector that can be updated manually over the history of a page.
mw.lib.sessions

A set of utilities for grouping revisions and other events into sessions

  • cluster() – Clusters a sequence of user actions into sessions.
  • Cache – Constructs a cache of recent user actions that can be updated manually in order to detect sessions.
mw.lib.title

A set of utilities for normalizing and parsing page titles

  • normalize() – Normalizes a page title.
  • Parser – Constructs a parser with a set of namespaces that can be used to parse and normalize page titles.

Contributors

None yet. See http://github.com/halfak/mediawiki-utilities. Pull requests are encouraged.

Table Of Contents

Next topic

mw.types – common types

This Page