XML Dump Iteration¶
These classes form the basis of iterative processing of XML dumps. These datatypes are based on those found in http://pythonhosted.org/mwtypes
-
class
mwxml.
Dump
(site_info, items)[source]¶ XML Dump Iterator. Dump file meta data and a
Page
iterator. Instances of this class can be called as an iterator directly. Usually, you’ll want to construct this class usingfrom_file()
.Parameters: - site_info :
The data from the <siteinfo> block
- pages : iterable
An iterable of
Page
in the order they appear in the XML
SiteInfo
Example: from mwxml import Dump, Page # Construct dump file iterator dump = Dump.from_file(open("example/dump.xml")) # Iterate through pages for page in dump.pages: # Iterate through a page's revisions for revision in page: print(revision.id)
Attributes: -
site_info
= Information from the <siteinfo> block : mwxml.SiteInfo¶ Metadata from the <siteinfo> tag :
SiteInfo
-
pages
= The mwxml.Page that appear in the dump : iterator¶ An iterator of
mwxml.Page
elements
-
items
= The mwxml.Page and/or mwxml.LogItem that appear in the dump : iterator¶ An iterator of
mwxml.Page
and/ormwxml.LogItem
elements
-
log_items
= The mwxml.LogItem that appear in the dump : iterator¶ An iterator of
mwxml.LogItem
elements
-
classmethod
from_file
(f)[source]¶ Constructs a
Dump
from a file pointer.Parameters: - f : file
A plain text file pointer containing XML to process
-
classmethod
from_page_xml
(page_xml)[source]¶ Constructs a
Dump
from a <page> block.Parameters: - page_xml : str | file
Either a plain string or a file containing <page> block XML to process
-
items
An iterator of
mwxml.Page
and/ormwxml.LogItem
elements
-
log_items
An iterator of
mwxml.LogItem
elements
-
pages
An iterator of
mwxml.Page
elements
-
site_info
Metadata from the <siteinfo> tag :
SiteInfo
-
class
mwxml.
SiteInfo
(*args, **kwargs)[source]¶ Represents the data from the <siteinfo> in a MediaWiki XML dump.
-
name
= The name of the site. : str | None¶
-
dbname
= The database name of the site. : str | None¶
-
base
= TODO: ??? : str | None¶
-
generator
= TODO: ??? : str | None¶
-
case
= TODO: ??? : str | None¶
-
namespaces
= list(mwxml.Namespace) | None¶
-
-
class
mwxml.
Page
(*args, **kwargs)[source]¶ Page meta data and a
Revision
iterator. Instances of this class can be called as iterators directly. Seemwtypes.Page
for a description of fields.Example: page = mwxml.Page( ... ) for revision in page: print("{0} {1}".format(revision.id, page.id))
-
class
mwxml.
LogItem
(*args, **kwargs)[source]¶ LogItem meta data. See
mwtypes.LogItem
for a description of fields.Example: dump = mwxml.Dump( ... ) for log_item in dump.log_items: print("{0} {1}".format(log_item.id, log_item.type))
-
class
Deleted
(*args, **kwargs)¶ Represents information about the deleted/suppressed status of a log item and it’s associated data.
Attributes: -
Deleted.
action
= Is the action of this log item deleted/suppressed? : bool | None¶
-
Deleted.
comment
= Is the text of this log item deleted/suppressed? : bool | None¶
-
Deleted.
user
= Is the user of this log item deleted/suppressed? : bool | None¶
-
Deleted.
restricted
= Is the log item restricted? : bool | None¶
-
classmethod
from_int
(integer)¶ Constructs a Deleted using the tinyint value of the log_deleted column of the logging MariaDB table.
- DELETED_ACTION = 1
- DELETED_COMMENT = 2
- DELETED_USER = 4
- DELETED_RESTRICTED = 8
-
-
class
-
class
mwxml.
Revision
(*args, **kwargs)[source]¶ Revision metadata and text. See
mwtypes.Revision
for a description of fields.-
class
Deleted
(*args, **kwargs)¶ Represents information about the deleted/suppressed status of a revision and it’s associated data.
Attributes: -
Deleted.
text
= Is the text of this revision deleted/suppressed? : bool | None¶
-
Deleted.
comment
= Is the text of this revision deleted/suppressed? : bool | None¶
-
Deleted.
user
= Is the user of this revision deleted/suppressed? : bool | None¶
-
Deleted.
restricted
= Is the revision restricted? : bool | None¶
-
classmethod
from_int
(integer)¶ Constructs a Deleted using the tinyint value of the rev_deleted column of the revision MariaDB table.
- DELETED_TEXT = 1
- DELETED_COMMENT = 2
- DELETED_USER = 4
- DELETED_RESTRICTED = 8
-
-
class
-
class
mwxml.
Namespace
(*args, **kwargs)[source]¶ See
mwtypes.Namespace
for a description of fields