ecoxipy.pyxom - Pythonic XML Object Model (PyXOM)

This module implements the Pythonic XML Object Model (PyXOM) for the representation of XML structures. To conveniently create PyXOM data structures use ecoxipy.pyxom.output, for indexing use ecoxipy.pyxom.indexing (if Document.element_by_id and Document.elements_by_name are not enough for you).

Examples

XML Creation

If you use the constructors be sure to supply the right data types, otherwise use the create() methods or use ecoxipy.MarkupBuilder, which take care of conversion.

>>> from ecoxipy import MarkupBuilder
>>> b = MarkupBuilder()
>>> document = Document.create(
...     b.article(
...         b.h1(
...             b & '<Example>',
...             data='to quote: <&>"\''
...         ),
...         b.p(
...             {'umlaut-attribute': u'äöüß'},
...             'Hello', Element.create('em', ' World',
...                 attributes={'count':1}), '!'
...         ),
...         None,
...         b.div(
...             Element.create('data-element', Text.create(u'äöüß <&>')),
...             b(
...                 '<p attr="value">raw content</p>Some Text',
...                 b.br,
...                 (i for i in range(3))
...             ),
...             (i for i in range(3, 6))
...         ),
...         Comment.create('<This is a comment!>'),
...         ProcessingInstruction.create('pi-target', '<PI content>'),
...         ProcessingInstruction.create('pi-without-content'),
...         b['foo:somexml'](
...             b['foo:somexml']({'foo:bar': 1, 't:test': 2}),
...             b['somexml']({'xmlns': ''}),
...             b['bar:somexml'],
...             {'xmlns:foo': 'foo://bar', 'xmlns:t': '',
...                 'foo:bar': 'Hello', 'id': 'foo'}
...         ),
...         {'xmlns': 'http://www.w3.org/1999/xhtml/'}
...     ), doctype_name='article', omit_xml_declaration=True
... )

Enforcing Well-Formedness

Using the create() methods or passing the parameter check_well_formedness as True to the appropriate constructors enforces that the element, attribute and document type names are valid XML names, and that processing instruction target and content as well as comment contents conform to their constraints:

>>> from ecoxipy import XMLWellFormednessException
>>> def catch_not_well_formed(cls, *args, **kargs):
...     try:
...         return cls.create(*args, **kargs)
...     except XMLWellFormednessException as e:
...         print(e)
>>> t = catch_not_well_formed(Document, [], doctype_name='1nvalid-xml-name')
The value "1nvalid-xml-name" is not a valid XML name.
>>> t = catch_not_well_formed(Document, [], doctype_name='html', doctype_publicid='"')
The value "\"" is not a valid document type public ID.
>>> t = catch_not_well_formed(Document, [], doctype_name='html', doctype_systemid='"\'')
The value "\"'" is not a valid document type system ID.
>>> t = catch_not_well_formed(Element, '1nvalid-xml-name', [], {})
The value "1nvalid-xml-name" is not a valid XML name.
>>> t = catch_not_well_formed(Element, 't', [], attributes={'1nvalid-xml-name': 'content'})
The value "1nvalid-xml-name" is not a valid XML name.
>>> t = catch_not_well_formed(ProcessingInstruction, '1nvalid-xml-name')
The value "1nvalid-xml-name" is not a valid XML processing instruction target.
>>> t = catch_not_well_formed(ProcessingInstruction, 'target', 'invalid PI content ?>')
The value "invalid PI content ?>" is not a valid XML processing instruction content because it contains "?>".
>>> t = catch_not_well_formed(Comment, 'invalid XML comment --')
The value "invalid XML comment --" is not a valid XML comment because it contains "--".

Manipulation and Equality

All XMLNode instances have attributes which allow for modification. Document and Element instances also allow modification of their contents like sequences.

Duplication and Comparisons

Use XMLNode.duplicate() to create a deep copy of a XML node:

>>> document_copy = document.duplicate()
>>> document is document_copy
False

Equality and inequality recursively compare XML nodes:

>>> document == document_copy
True
>>> document != document_copy
False

Attributes

The attributes of an Element instance are available as Element.attributes. This is an Attributes instance which contains Attribute instances:

>>> document_copy[0][0].attributes['data']
ecoxipy.pyxom.Attribute('data', 'to quote: <&>"\'')
>>> old_data = document_copy[0][0].attributes['data'].value
>>> document_copy[0][0].attributes['data'].value = 'foo bar'
>>> document_copy[0][0].attributes['data'].value == u'foo bar'
True
>>> 'data' in document_copy[0][0].attributes
True
>>> document == document_copy
False
>>> document != document_copy
True
>>> document_copy[0][0].attributes['data'].value = old_data
>>> document == document_copy
True
>>> document != document_copy
False

Attributes instances allow for creation of Attribute instances:

>>> somexml = document_copy[0][-1]
>>> foo_attr = somexml[0].attributes.create_attribute('foo:foo', 'bar')
>>> foo_attr is somexml[0].attributes['foo:foo']
True
>>> foo_attr == somexml[0].attributes['foo:foo']
True
>>> foo_attr != somexml[0].attributes['foo:foo']
False
>>> 'foo:foo' in somexml[0].attributes
True
>>> foo_attr.namespace_uri == u'foo://bar'
True

Attributes may be removed:

>>> somexml[0].attributes.remove(foo_attr)
>>> 'foo:foo' in somexml[0].attributes
False
>>> foo_attr.parent == None
True
>>> foo_attr.namespace_uri == False
True

You can also add an attribute to an element’s attributes, it is automatically moved if it belongs to another element’s attributes:

>>> somexml[0].attributes.add(foo_attr)
>>> 'foo:foo' in somexml[0].attributes
True
>>> foo_attr.parent == somexml[0].attributes
True
>>> foo_attr.parent != somexml[0].attributes
False
>>> foo_attr.namespace_uri == u'foo://bar'
True
>>> del somexml[0].attributes['foo:foo']
>>> 'foo:foo' in somexml[0].attributes
False
>>> attr = document[0][-1].attributes['foo:bar']
>>> attr.name = 'test'
>>> attr.namespace_prefix is None
True
>>> print(attr.local_name)
test

Documents and Elements

>>> document_copy[0].insert(1, document_copy[0][0])
>>> document_copy[0][0] == document[0][1]
True
>>> document_copy[0][0] != document[0][1]
False
>>> document_copy[0][1] == document[0][0]
True
>>> document_copy[0][1] != document[0][0]
False
>>> p_element = document_copy[0][0]
>>> document_copy[0].remove(p_element)
>>> document_copy[0][0].name == u'h1' and p_element.parent is None
True
>>> p_element in document_copy[0]
False
>>> p_element.namespace_uri == False
True
>>> document_copy[0][0].append(p_element)
>>> document_copy[0][0][-1] is p_element
True
>>> p_element in document_copy[0][0]
True
>>> p_element.namespace_uri == u'http://www.w3.org/1999/xhtml/'
True
>>> p_element in document[0]
False
>>> document[0][1] in document_copy[0][0]
False
>>> document[0][1] is document_copy[0][0][-1]
False
>>> document[0][1] == document_copy[0][0][-1]
True
>>> document[0][1] != document_copy[0][0][-1]
False
>>> document[0][-1].name = 'foo'
>>> document[0][-1].namespace_prefix is None
True
>>> print(document[0][-1].local_name)
foo

Indexes and Manipulation

If a document is modified, the indexes should be deleted. This can be done using del() on the index attribute or calling delete_indexes().

>>> del document_copy[0][-1]
>>> document_copy.delete_indexes()
>>> 'foo' in document_copy.element_by_id
False
>>> 'foo:somexml' in document_copy.elements_by_name
False

XML Serialization

First we remove embedded non-HTML XML, as there are multiple attributes on the element and the order they are rendered in is indeterministic, which makes it hard to compare:

>>> del document[0][-1]

Getting the Unicode value of an document yields the XML document serialized as an Unicode string:

>>> document_string = u"""<!DOCTYPE article><article xmlns="http://www.w3.org/1999/xhtml/"><h1 data="to quote: &lt;&amp;&gt;&quot;'">&lt;Example&gt;</h1><p umlaut-attribute="äöüß">Hello<em count="1"> World</em>!</p><div><data-element>äöüß &lt;&amp;&gt;</data-element><p attr="value">raw content</p>Some Text<br/>012345</div><!--<This is a comment!>--><?pi-target <PI content>?><?pi-without-content?></article>"""
>>> import sys
>>> if sys.version_info[0] < 3:
...     unicode(document) == document_string
... else:
...     str(document) == document_string
True

Getting the bytes() value of an Document creates a byte string of the serialized XML with the encoding specified on creation of the instance, it defaults to “UTF-8”:

>>> bytes(document) == document_string.encode('UTF-8')
True

XMLNode instances can also generate SAX events, see XMLNode.create_sax_events() (note that the default xml.sax.ContentHandler is xml.sax.saxutils.ContentHandler, which does not support comments):

>>> document_string = u"""<?xml version="1.0" encoding="UTF-8"?>\n<article xmlns="http://www.w3.org/1999/xhtml/"><h1 data="to quote: &lt;&amp;&gt;&quot;'">&lt;Example&gt;</h1><p umlaut-attribute="äöüß">Hello<em count="1"> World</em>!</p><div><data-element>äöüß &lt;&amp;&gt;</data-element><p attr="value">raw content</p>Some Text<br></br>012345</div><?pi-target <PI content>?><?pi-without-content ?></article>"""
>>> import sys
>>> from io import BytesIO
>>> string_out = BytesIO()
>>> content_handler = document.create_sax_events(out=string_out)
>>> string_out.getvalue() == document_string.encode('UTF-8')
True
>>> string_out.close()

You can also create indented XML when calling the XMLNode.create_sax_events() by supplying the indent_incr argument:

>>> indented_document_string = u"""\
... <?xml version="1.0" encoding="UTF-8"?>
... <article xmlns="http://www.w3.org/1999/xhtml/">
...     <h1 data="to quote: &lt;&amp;&gt;&quot;'">
...         &lt;Example&gt;
...     </h1>
...     <p umlaut-attribute="äöüß">
...         Hello
...         <em count="1">
...              World
...         </em>
...         !
...     </p>
...     <div>
...         <data-element>
...             äöüß &lt;&amp;&gt;
...         </data-element>
...         <p attr="value">
...             raw content
...         </p>
...         Some Text
...         <br></br>
...         012345
...     </div>
...     <?pi-target <PI content>?>
...     <?pi-without-content ?>
... </article>
... """
>>> string_out = BytesIO()
>>> content_handler = document.create_sax_events(indent_incr='    ', out=string_out)
>>> string_out.getvalue() == indented_document_string.encode('UTF-8')
True
>>> string_out.close()

Classes

Document

class ecoxipy.pyxom.Document(doctype_name, doctype_publicid, doctype_systemid, children, omit_xml_declaration, encoding, check_well_formedness=False)

A ContainerNode representing a XML document.

Parameters:
  • doctype_name (Unicode string) – The document type root element name or None if the document should not have document type declaration.
  • doctype_publicid (Unicode string) – The public ID of the document type declaration or None.
  • doctype_systemid (Unicode string) – The system ID of the document type declaration or None.
  • children – The document root XMLNode instances.
  • encoding (Unicode string) – The encoding of the document. If it is None UTF-8 is used.
  • omit_xml_declaration (bool()) – If True the XML declaration is omitted.
  • check_well_formedness (bool()) – If True the document element name will be checked to be a valid XML name.
Raises ecoxipy.XMLWellFormednessException:
 

If check_well_formedness is True and doctype_name is not a valid XML name, doctype_publicid is not a valid public ID or doctype_systemid is not a valid system ID.

static create(*children, **kargs)

Creates a document and converts parameters to appropriate types.

Parameters:
  • children – The document root nodes. All items that are not XMLNode instances create Text nodes after they have been converted to Unicode strings.
  • kargs – The same parameters as the constructor has (except children) are recognized. The items doctype_name, doctype_publicid, doctype_systemid, and encoding are converted to Unicode strings if they are not None. omit_xml_declaration is converted to boolean.
Returns:

The created document.

Return type:

Document

Raises ecoxipy.XMLWellFormednessException:
 

If doctype_name is not a valid XML name, doctype_publicid is not a valid public ID or doctype_systemid is not a valid system ID.

doctype

The DocumentType instance of the document.

On setting one of the following occurs:

  1. If the value is None, the document type’s attributes are set to None.
  2. If the value is a byte or Unicode string, the document type document element name is set to this value (a byte string will be converted to Unicode). The document type public and system IDs will be set to None.
  3. If the value is a mapping, the items identified by the strings 'name', 'publicid' or 'systemid' define the respective attributes of the document type, the others are assumed to be None.
  4. If the value is a sequence, the item at position zero defines the document type document element name, the item at position one defines the public ID and the item at position two defines the system ID. If the sequence is shorter than three, non-available items are assumed to be None.

The document type values are converted to appropriate values and their validity is checked if check_well_formedness is True.

Example:

>>> doc = Document.create()
>>> doc.doctype
ecoxipy.pyxom.DocumentType(None, None, None)
>>> doc.doctype = {'name': 'test', 'systemid': 'foo bar'}
>>> doc.doctype
ecoxipy.pyxom.DocumentType('test', None, 'foo bar')
>>> doc.doctype = ('html', 'foo bar')
>>> doc.doctype
ecoxipy.pyxom.DocumentType('html', 'foo bar', None)
>>> doc.doctype = 'foo'
>>> doc.doctype
ecoxipy.pyxom.DocumentType('foo', None, None)
>>> doc.doctype = None
>>> doc.doctype
ecoxipy.pyxom.DocumentType(None, None, None)
omit_xml_declaration

If True the XML declaration is omitted.

encoding

The encoding of the document. On setting if the value is None it is set to UTF-8, otherwise it is converted to an Unicode string.

create_sax_events(content_handler=None, out=None, out_encoding='UTF-8', indent_incr=None)

Creates SAX events.

Parameters:
  • content_handler (xml.sax.ContentHandler) – If this is None a xml.sax.saxutils.XMLGenerator is created and used as the content handler. If in this case out is not None, it is used for output.
  • out – The output to write to if no content_handler is given. It should have a write() method like files.
  • out_encoding – The output encoding or None for Unicode output.
  • indent_incr (str()) – If this is not None this activates pretty printing. In this case it should be a string and it is used for indenting.
Returns:

The content handler used.

duplicate()

Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.

element_by_id

A ecoxipy.pyxom.indexing.IndexDescriptor instance using a ecoxipy.pyxom.indexing.ElementByUniqueAttributeValueIndexer for indexing.

Use it like a mapping to retrieve the element having an attribute id with the value being equal to the requested key, possibly throwing a KeyError if such an element does not exist.

Important: If the document’s childs are relevantly modified (i.e. an id attribute was created, modified or deleted), delete_indexes() should be called or this attribute should be deleted on the instance, which deletes the index.

elements_by_name

A ecoxipy.pyxom.indexing.IndexDescriptor instance using a ecoxipy.pyxom.indexing.ElementsByNameIndexer for indexing.

Use it like a mapping to retrieve an iterator over elements having a name equal to the requested key, possibly throwing a KeyError if such an element does not exist.

Important: If the document’s childs are relevantly modified (i.e. new elements were added or deleted, elements’ names were modified), delete_indexes() should be called or this attribute should be deleted on the instance, which deletes the index.

nodes_by_namespace

A ecoxipy.pyxom.indexing.IndexDescriptor instance using a ecoxipy.pyxom.indexing.NamespaceIndexer for indexing.

Important: If the document’s childs are relevantly modified (i.e. new elements/attributes were added or deleted, elements’/attributes’ names were modified), delete_indexes() should be called or this attribute should be deleted on the instance, which deletes the index.

delete_indexes()

A shortcut to delete the indexes of element_by_id and elements_by_name.

class ecoxipy.pyxom.DocumentType(name, publicid, systemid, check_well_formedness)

Represents a document type declaration of a Document. It should not be instantiated on itself.

Parameters:
  • name (Unicode string) – The document element name.
  • publicid (Unicode string) – The document type public ID or None.
  • systemid (Unicode string) – The document type system ID or None.
  • check_well_formedness (bool()) – If True the document element name will be checked to be a valid XML name.
name

The document element name or None. On setting if the value is None, publicid and systemid are also set to None. Otherwise the value is converted to an Unicode string; a ecoxipy.XMLWellFormednessException is thrown if it is not a valid XML name and check_well_formedness is True.

publicid

The document type public ID or None. On setting if the value is not None it is converted to a Unicode string; a ecoxipy.XMLWellFormednessException is thrown if it is not a valid doctype public ID and check_well_formedness is True.

systemid

The document type system ID or None. On setting if the value is not None it is converted to a Unicode string; a ecoxipy.XMLWellFormednessException is thrown if it is not a valid doctype system ID and check_well_formedness is True.

Element

class ecoxipy.pyxom.Element(name, children, attributes, check_well_formedness=False)

Represents a XML element. It inherits from ContainerNode and NamespaceNameMixin.

Parameters:
  • name (Unicode string) – The name of the element to create.
  • children (iterable of items) – The children XMLNode instances of the element.
  • attributes – Defines the attributes of the element. Must be usable as the parameter of dict and should contain only Unicode strings as key and value definitions.
  • check_well_formedness (bool()) – If True the element name and attribute names will be checked to be a valid XML name.
Raises ecoxipy.XMLWellFormednessException:
 

If check_well_formedness is True and the name is not a valid XML name.

static create(name, *children, **kargs)

Creates an element and converts parameters to appropriate types.

Parameters:
  • children – The element child nodes. All items that are not XMLNode instances create Text nodes after they have been converted to Unicode strings.
  • kargs – The item attributes defines the attributes and must have a method items() (like dict) which returns an iterable of 2-tuple() instances containing the attribute name as the first and the attribute value as the second item. Attribute names and values are converted to Unicode strings.
Returns:

The created element.

Return type:

Element

Raises ecoxipy.XMLWellFormednessException:
 

If the name is not a valid XML name.

namespace_prefixes

An iterator over all namespace prefixes defined in the element and its parents. Duplicate values may be retrieved.

get_namespace_prefix_element(prefix)

Calculates the element the namespace prefix is defined in, this is None if the prefix is not defined.

get_namespace_uri(prefix)

Calculates the namespace URI for the prefix, this is False if the prefix is not defined..

name

The name of the element. On setting the value is converted to an Unicode string; a ecoxipy.XMLWellFormednessException is thrown if it is not a valid XML name and check_well_formedness is True.

attributes

An Attributes instance containing the element’s attributes.

duplicate()

Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.

class ecoxipy.pyxom.Attribute(parent, name, value, check_well_formedness)

Represents an item of an Element‘s Attributes. It inherits from NamespaceNameMixin and should not be instantiated on itself, rather use Attributes.create_attribute().

parent

The parent Attributes.

name

The attribute’s name. On setting the value is converted to an Unicode string, if there is already another attribute with the same name on the parent Attributes instance a KeyError is raised.

value

The attribute’s value.

class ecoxipy.pyxom.Attributes(parent, attributes, check_well_formedness)

This mapping, containing Attribute instances identified by their names, represents attributes of an Element. It should not be instantiated on itself.

create_attribute(name, value)

Create a new Attribute as part of the instance.

Parameters:
  • name – the attribute’s name
  • value – the attribute’s value
Returns:

the created attribute

Return type:

Attribute

Raises KeyError:
 

If an attribute with name already exists in the instance.

add(attribute)

Add an attribute to the instance. If the attribute is contained in an Attributes instance it is first removed from that.

Parameters:

attribute (Attribute) – the attribute to add

Raises:
  • ValueError – if attribute is no Attribute instance
  • KeyError – If an attribute with the attribute‘s name already exists in the instance.
remove(attribute)

Remove the given attribute.

Parameters:

attribute (Attribute) – the attribute to remove

Raises:
  • KeyError – If no attribute with the name of attribute is contained in the instance.
  • ValueError – If there is an attribute with the name of attribute contained, but it is not attribute.
parent

The parent Element.

to_dict()

Creates a dict from the instance’s Attribute instances. The keys are the attribute’s names, identifying the attribute’s values.

Other Nodes

class ecoxipy.pyxom.Text(content)

A ContentNode representing a node of text.

duplicate()

Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.

class ecoxipy.pyxom.Comment(content, check_well_formedness=False)

A ContentNode representing a comment node.

Raises ecoxipy.XMLWellFormednessException:
 If check_well_formedness is True and content is not valid.
static create(content)

Creates a comment node.

Parameters:content – The content of the comment. This will be converted to an Unicode string.
Returns:The created commment node.
Return type:Comment
Raises ecoxipy.XMLWellFormednessException:
 If content is not valid.
duplicate()

Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.

content

The node content. On setting the value is converted to an Unicode string.

class ecoxipy.pyxom.ProcessingInstruction(target, content, check_well_formedness=False)

A ContentNode representing a processing instruction.

Parameters:
  • target – The target.
  • content – The content or None.
  • check_well_formedness (bool()) – If True the target will be checked to be a valid XML name.
Raises ecoxipy.XMLWellFormednessException:
 

If check_well_formedness is True and either the target or the content are not valid.

static create(target, content=None)

Creates a processing instruction node and converts the parameters to appropriate types.

Parameters:
  • target – The target, will be converted to an Unicode string.
  • content – The content, if it is not None it will be converted to an Unicode string.
Returns:

The created processing instruction.

Return type:

ProcessingInstruction

Raises ecoxipy.XMLWellFormednessException:
 

If either the target or the content are not valid.

target

The processing instruction target.

content

The node content. On setting the value is converted to an Unicode string.

duplicate()

Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.

Base Classes

class ecoxipy.pyxom.XMLNode

Base class for XML node objects.

Retrieving the byte string from an instance yields a byte string encoded as UTF-8.

parent

The parent ContainerNode or None if the node has no parent.

previous

The previous XMLNode or None if the node has no preceding sibling.

next

The next XMLNode or None if the node has no following sibling.

ancestors

Returns an iterator over all ancestors.

preceding_siblings

Returns an iterator over all preceding siblings.

following_siblings

Returns an iterator over all following siblings.

preceding

Returns an iterator over all preceding nodes.

following

Returns an iterator over all following nodes.

create_str(out=None, encoding='UTF-8')

Creates a string containing the XML representation of the node.

Parameters:
create_sax_events(content_handler=None, out=None, out_encoding='UTF-8', indent_incr=None)

Creates SAX events.

Parameters:
  • content_handler (xml.sax.ContentHandler) – If this is None a xml.sax.saxutils.XMLGenerator is created and used as the content handler. If in this case out is not None, it is used for output.
  • out – The output to write to if no content_handler is given. It should have a write() method like files.
  • out_encoding – The output encoding or None for Unicode output.
  • indent_incr (str()) – If this is not None this activates pretty printing. In this case it should be a string and it is used for indenting.
Returns:

The content handler used.

duplicate(test=None)

Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.

class ecoxipy.pyxom.ContainerNode(children)

A XMLNode containing other nodes with sequence semantics.

Parameters:children (list()) – The nodes contained of in the node.
children(reverse=False)

Returns an iterator over the children.

Parameters:reverse – If this is True the children are returned in reverse document order.
Returns:An iterator over the children.
descendants(reverse=False, depth_first=True, max_depth=None)

Returns an iterator over all descendants.

Parameters:
  • reverse – If this is True the descendants are returned in reverse document order.
  • depth_first – If this is True the descendants are returned depth-first, if it is False breadth-first traversal is used.
  • max_depth (int()) – The maximum depth, if this is None all descendants will be returned.
Returns:

An iterator over the descendants.

insert(index, child)

Insert child before index.

remove(child)

Remove child.

class ecoxipy.pyxom.ContentNode(content)

A XMLNode with content.

Parameters:content (Unicode string) – Becomes the content attribute.
classmethod create(content)

Creates an instance of the ContentNode implementation and converts content to an Unicode string.

Parameters:content – The content of the node. This will be converted to an Unicode string.
Returns:The created ContentNode implementation instance.
content

The node content. On setting the value is converted to an Unicode string.

class ecoxipy.pyxom.NamespaceNameMixin

Contains functionality implementing Namespaces in XML.

namespace_prefix

The namespace prefix (the part before :) of the node’s name.

local_name

The local name (the part after :) of the node’s name.

namespace_uri

The namespace URI the namespace_prefix refers to. It is None if there is no namespace prefix and it is False if the prefix lookup failed.