mw.lib.title – parsing and normalizing titles

mw.lib.title.normalize(title)

Normalizes a page title to the database format. E.g. spaces are converted to underscores and the first character in the title is converted to upper-case.

Parameters:
title : str

A page title

Returns:

The normalized title.

Example:
>>> from mw.lib import title
>>>
>>> title.normalize("foo bar")
'Foo_bar'

Title parser

class mw.lib.title.Parser(namespaces=None)

Constructs a page name parser from a set of mw.Namespace. Such a parser can be used to convert a full page name (namespace included with a colon; e.g, "Talk:Foo") into a namespace ID and mw.lib.title.normalize()‘d page title (e.g., (1, "Foo")).

Parameters:

namespaces : set( mw.Namespace )

Example:
>>> from mw import Namespace
>>> from mw.lib import title
>>>
>>> parser = title.Parser(
...     [
...             Namespace(0, "", case="first-letter"),
...             Namespace(1, "Discussão", canonical="Talk", case="first-letter"),
...             Namespace(2, "Usuário(a)", canonical="User", aliases={"U"}, case="first-letter")
...     ]
... )
>>>
>>> parser.parse("Discussão:Foo") # Using the standard name
(1, 'Foo')
>>> parser.parse("Talk:Foo bar") # Using the cannonical name
(1, 'Foo_bar')
>>> parser.parse("U:Foo bar") # Using an alias
(2, 'Foo_bar')
>>> parser.parse("Herpderp:Foo bar") # Psuedo namespace
(0, 'Herpderp:Foo_bar')
parse(page_name)

Parses a page name to extract the namespace.

Parameters:
page_name : str

A page name including the namespace prefix and a colon (if not Main)

Returns:

A tuple of (namespace : int, title : str)

add_namespace(namespace)

Adds a namespace to the parser.

Parameters:
namespace : mw.Namespace

A namespace

get_namespace(id=None, name=None)

Gets a namespace from the parser. Throws a KeyError if a namespace cannot be found.

Parameters:
id : int

A namespace ID

name : str

A namespace name (standard, cannonical names and aliases will be searched)

Returns:

A mw.Namespace.

classmethod from_site_info(si_doc)

Constructs a parser from the result of a mw.api.SiteInfo.query().

Parameters:
si_doc : dict

The result of a site_info request.

Returns:

An initialized mw.lib.title.Parser

classmethod from_api(session)

Constructs a parser from a mw.api.Session

Parameters:
session : mw.api.Session

An open API session

Returns:

An initialized mw.lib.title.Parser

classmethod from_dump(dump)

Constructs a parser from a mw.xml_dump.Iterator. Note that XML database dumps do not include namespace aliases or cannonical names so the parser that will be constructed will only work in common cases.

Parameters:
dump : mw.xml_dump.Iterator

An XML dump iterator

Returns:

An initialized mw.lib.title.Parser

Table Of Contents

Previous topic

mw.lib.sessions – event clustering

This Page