Source code for mw.xml_dump
"""
This module is a collection of utilities for efficiently processing MediaWiki's
XML database dumps. There are two important concerns that this module intends
to address: *performance* and the *complexity* of streaming XML parsing.
Performance
Performance is a serious concern when processing large database XML dumps.
Regretfully, the Global Intepreter Lock prevents us from running threads on
multiple CPUs. This library provides a :func:`map`, a function
that maps a dump processing over a set of dump files using
:class:`multiprocessing` to distribute the work over multiple CPUS
Complexity
Streaming XML parsing is gross. XML dumps are (1) some site meta data, (2)
a collection of pages that contain (3) collections of revisions. The
module allows you to think about dump files in this way and ignore the
fact that you're streaming XML. An :class:`Iterator` contains
site meta data and an iterator of :class:`Page`'s. A
:class:`Page` contains page meta data and an iterator of
:class:`Revision`'s. A :class:`Revision` contains revision meta data
including a :class:`Contributor` (if one a contributor was specified in the
XML).
"""
from .map import map
from .iteration import *
from .functions import file, open_file