This module implements the Pythonic XML Object Model (PyXOM) for the representation of XML structures. To conveniently create PyXOM data structures use ecoxipy.pyxom.output, for indexing use ecoxipy.pyxom.indexing (if Document.element_by_id and Document.elements_by_name are not enough for you).
If you use the constructors be sure to supply the right data types, otherwise use the create() methods or use ecoxipy.MarkupBuilder, which take care of conversion.
>>> from ecoxipy import MarkupBuilder
>>> b = MarkupBuilder()
>>> document = Document.create(
... b.article(
... b.h1(
... b & '<Example>',
... data='to quote: <&>"\''
... ),
... b.p(
... {'umlaut-attribute': u'äöüß'},
... 'Hello', Element.create('em', ' World',
... attributes={'count':1}), '!'
... ),
... None,
... b.div(
... Element.create('data-element', Text.create(u'äöüß <&>')),
... b(
... '<p attr="value">raw content</p>Some Text',
... b.br,
... (i for i in range(3))
... ),
... (i for i in range(3, 6))
... ),
... Comment.create('<This is a comment!>'),
... ProcessingInstruction.create('pi-target', '<PI content>'),
... ProcessingInstruction.create('pi-without-content'),
... b['foo:somexml'](
... b['foo:somexml']({'foo:bar': 1, 't:test': 2}),
... b['somexml']({'xmlns': ''}),
... b['bar:somexml'],
... {'xmlns:foo': 'foo://bar', 'xmlns:t': '',
... 'foo:bar': 'Hello', 'id': 'foo'}
... ),
... {'xmlns': 'http://www.w3.org/1999/xhtml/'}
... ), doctype_name='article', omit_xml_declaration=True
... )
Using the create() methods or passing the parameter check_well_formedness as True to the appropriate constructors enforces that the element, attribute and document type names are valid XML names, and that processing instruction target and content as well as comment contents conform to their constraints:
>>> from ecoxipy import XMLWellFormednessException
>>> def catch_not_well_formed(cls, *args, **kargs):
... try:
... return cls.create(*args, **kargs)
... except XMLWellFormednessException as e:
... print(e)
>>> t = catch_not_well_formed(Document, [], doctype_name='1nvalid-xml-name')
The value "1nvalid-xml-name" is not a valid XML name.
>>> t = catch_not_well_formed(Document, [], doctype_name='html', doctype_publicid='"')
The value "\"" is not a valid document type public ID.
>>> t = catch_not_well_formed(Document, [], doctype_name='html', doctype_systemid='"\'')
The value "\"'" is not a valid document type system ID.
>>> t = catch_not_well_formed(Element, '1nvalid-xml-name', [], {})
The value "1nvalid-xml-name" is not a valid XML name.
>>> t = catch_not_well_formed(Element, 't', [], attributes={'1nvalid-xml-name': 'content'})
The value "1nvalid-xml-name" is not a valid XML name.
>>> t = catch_not_well_formed(ProcessingInstruction, '1nvalid-xml-name')
The value "1nvalid-xml-name" is not a valid XML processing instruction target.
>>> t = catch_not_well_formed(ProcessingInstruction, 'target', 'invalid PI content ?>')
The value "invalid PI content ?>" is not a valid XML processing instruction content because it contains "?>".
>>> t = catch_not_well_formed(Comment, 'invalid XML comment --')
The value "invalid XML comment --" is not a valid XML comment because it contains "--".
All XMLNode instances have attributes which allow for modification. Document and Element instances also allow modification of their contents like sequences.
Use XMLNode.duplicate() to create a deep copy of a XML node:
>>> document_copy = document.duplicate()
>>> document is document_copy
False
Equality and inequality recursively compare XML nodes:
>>> document == document_copy
True
>>> document != document_copy
False
The attributes of an Element instance are available as Element.attributes. This is an Attributes instance which contains Attribute instances:
>>> document_copy[0][0].attributes['data']
ecoxipy.pyxom.Attribute('data', 'to quote: <&>"\'')
>>> old_data = document_copy[0][0].attributes['data'].value
>>> document_copy[0][0].attributes['data'].value = 'foo bar'
>>> document_copy[0][0].attributes['data'].value == u'foo bar'
True
>>> 'data' in document_copy[0][0].attributes
True
>>> document == document_copy
False
>>> document != document_copy
True
>>> document_copy[0][0].attributes['data'].value = old_data
>>> document == document_copy
True
>>> document != document_copy
False
Attributes instances allow for creation of Attribute instances:
>>> somexml = document_copy[0][-1]
>>> foo_attr = somexml[0].attributes.create_attribute('foo:foo', 'bar')
>>> foo_attr is somexml[0].attributes['foo:foo']
True
>>> foo_attr == somexml[0].attributes['foo:foo']
True
>>> foo_attr != somexml[0].attributes['foo:foo']
False
>>> 'foo:foo' in somexml[0].attributes
True
>>> foo_attr.namespace_uri == u'foo://bar'
True
Attributes may be removed:
>>> somexml[0].attributes.remove(foo_attr)
>>> 'foo:foo' in somexml[0].attributes
False
>>> foo_attr.parent == None
True
>>> foo_attr.namespace_uri == False
True
You can also add an attribute to an element’s attributes, it is automatically moved if it belongs to another element’s attributes:
>>> somexml[0].attributes.add(foo_attr)
>>> 'foo:foo' in somexml[0].attributes
True
>>> foo_attr.parent == somexml[0].attributes
True
>>> foo_attr.parent != somexml[0].attributes
False
>>> foo_attr.namespace_uri == u'foo://bar'
True
>>> del somexml[0].attributes['foo:foo']
>>> 'foo:foo' in somexml[0].attributes
False
>>> attr = document[0][-1].attributes['foo:bar']
>>> attr.name = 'test'
>>> attr.namespace_prefix is None
True
>>> print(attr.local_name)
test
>>> document_copy[0].insert(1, document_copy[0][0])
>>> document_copy[0][0] == document[0][1]
True
>>> document_copy[0][0] != document[0][1]
False
>>> document_copy[0][1] == document[0][0]
True
>>> document_copy[0][1] != document[0][0]
False
>>> p_element = document_copy[0][0]
>>> document_copy[0].remove(p_element)
>>> document_copy[0][0].name == u'h1' and p_element.parent is None
True
>>> p_element in document_copy[0]
False
>>> p_element.namespace_uri == False
True
>>> document_copy[0][0].append(p_element)
>>> document_copy[0][0][-1] is p_element
True
>>> p_element in document_copy[0][0]
True
>>> p_element.namespace_uri == u'http://www.w3.org/1999/xhtml/'
True
>>> p_element in document[0]
False
>>> document[0][1] in document_copy[0][0]
False
>>> document[0][1] is document_copy[0][0][-1]
False
>>> document[0][1] == document_copy[0][0][-1]
True
>>> document[0][1] != document_copy[0][0][-1]
False
>>> document[0][-1].name = 'foo'
>>> document[0][-1].namespace_prefix is None
True
>>> print(document[0][-1].local_name)
foo
If a document is modified, the indexes should be deleted. This can be done using del() on the index attribute or calling delete_indexes().
>>> del document_copy[0][-1]
>>> document_copy.delete_indexes()
>>> 'foo' in document_copy.element_by_id
False
>>> 'foo:somexml' in document_copy.elements_by_name
False
First we remove embedded non-HTML XML, as there are multiple attributes on the element and the order they are rendered in is indeterministic, which makes it hard to compare:
>>> del document[0][-1]
Getting the Unicode value of an document yields the XML document serialized as an Unicode string:
>>> document_string = u"""<!DOCTYPE article><article xmlns="http://www.w3.org/1999/xhtml/"><h1 data="to quote: <&>"'"><Example></h1><p umlaut-attribute="äöüß">Hello<em count="1"> World</em>!</p><div><data-element>äöüß <&></data-element><p attr="value">raw content</p>Some Text<br/>012345</div><!--<This is a comment!>--><?pi-target <PI content>?><?pi-without-content?></article>"""
>>> import sys
>>> if sys.version_info[0] < 3:
... unicode(document) == document_string
... else:
... str(document) == document_string
True
Getting the bytes() value of an Document creates a byte string of the serialized XML with the encoding specified on creation of the instance, it defaults to “UTF-8”:
>>> bytes(document) == document_string.encode('UTF-8')
True
XMLNode instances can also generate SAX events, see XMLNode.create_sax_events() (note that the default xml.sax.ContentHandler is xml.sax.saxutils.ContentHandler, which does not support comments):
>>> document_string = u"""<?xml version="1.0" encoding="UTF-8"?>\n<article xmlns="http://www.w3.org/1999/xhtml/"><h1 data="to quote: <&>"'"><Example></h1><p umlaut-attribute="äöüß">Hello<em count="1"> World</em>!</p><div><data-element>äöüß <&></data-element><p attr="value">raw content</p>Some Text<br></br>012345</div><?pi-target <PI content>?><?pi-without-content ?></article>"""
>>> import sys
>>> from io import BytesIO
>>> string_out = BytesIO()
>>> content_handler = document.create_sax_events(out=string_out)
>>> string_out.getvalue() == document_string.encode('UTF-8')
True
>>> string_out.close()
You can also create indented XML when calling the XMLNode.create_sax_events() by supplying the indent_incr argument:
>>> indented_document_string = u"""\
... <?xml version="1.0" encoding="UTF-8"?>
... <article xmlns="http://www.w3.org/1999/xhtml/">
... <h1 data="to quote: <&>"'">
... <Example>
... </h1>
... <p umlaut-attribute="äöüß">
... Hello
... <em count="1">
... World
... </em>
... !
... </p>
... <div>
... <data-element>
... äöüß <&>
... </data-element>
... <p attr="value">
... raw content
... </p>
... Some Text
... <br></br>
... 012345
... </div>
... <?pi-target <PI content>?>
... <?pi-without-content ?>
... </article>
... """
>>> string_out = BytesIO()
>>> content_handler = document.create_sax_events(indent_incr=' ', out=string_out)
>>> string_out.getvalue() == indented_document_string.encode('UTF-8')
True
>>> string_out.close()
A ContainerNode representing a XML document.
Parameters: |
|
---|---|
Raises ecoxipy.XMLWellFormednessException: | |
If check_well_formedness is True and doctype_name is not a valid XML name, doctype_publicid is not a valid public ID or doctype_systemid is not a valid system ID. |
Creates a document and converts parameters to appropriate types.
Parameters: |
|
---|---|
Returns: | The created document. |
Return type: | |
Raises ecoxipy.XMLWellFormednessException: | |
If doctype_name is not a valid XML name, doctype_publicid is not a valid public ID or doctype_systemid is not a valid system ID. |
The DocumentType instance of the document.
On setting one of the following occurs:
The document type values are converted to appropriate values and their validity is checked if check_well_formedness is True.
Example:
>>> doc = Document.create()
>>> doc.doctype
ecoxipy.pyxom.DocumentType(None, None, None)
>>> doc.doctype = {'name': 'test', 'systemid': 'foo bar'}
>>> doc.doctype
ecoxipy.pyxom.DocumentType('test', None, 'foo bar')
>>> doc.doctype = ('html', 'foo bar')
>>> doc.doctype
ecoxipy.pyxom.DocumentType('html', 'foo bar', None)
>>> doc.doctype = 'foo'
>>> doc.doctype
ecoxipy.pyxom.DocumentType('foo', None, None)
>>> doc.doctype = None
>>> doc.doctype
ecoxipy.pyxom.DocumentType(None, None, None)
If True the XML declaration is omitted.
The encoding of the document. On setting if the value is None it is set to UTF-8, otherwise it is converted to an Unicode string.
Creates SAX events.
Parameters: |
|
---|---|
Returns: | The content handler used. |
Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.
A ecoxipy.pyxom.indexing.IndexDescriptor instance using a ecoxipy.pyxom.indexing.ElementByUniqueAttributeValueIndexer for indexing.
Use it like a mapping to retrieve the element having an attribute id with the value being equal to the requested key, possibly throwing a KeyError if such an element does not exist.
Important: If the document’s childs are relevantly modified (i.e. an id attribute was created, modified or deleted), delete_indexes() should be called or this attribute should be deleted on the instance, which deletes the index.
A ecoxipy.pyxom.indexing.IndexDescriptor instance using a ecoxipy.pyxom.indexing.ElementsByNameIndexer for indexing.
Use it like a mapping to retrieve an iterator over elements having a name equal to the requested key, possibly throwing a KeyError if such an element does not exist.
Important: If the document’s childs are relevantly modified (i.e. new elements were added or deleted, elements’ names were modified), delete_indexes() should be called or this attribute should be deleted on the instance, which deletes the index.
A ecoxipy.pyxom.indexing.IndexDescriptor instance using a ecoxipy.pyxom.indexing.NamespaceIndexer for indexing.
Important: If the document’s childs are relevantly modified (i.e. new elements/attributes were added or deleted, elements’/attributes’ names were modified), delete_indexes() should be called or this attribute should be deleted on the instance, which deletes the index.
A shortcut to delete the indexes of element_by_id and elements_by_name.
Represents a document type declaration of a Document. It should not be instantiated on itself.
Parameters: |
|
---|
The document element name or None. On setting if the value is None, publicid and systemid are also set to None. Otherwise the value is converted to an Unicode string; a ecoxipy.XMLWellFormednessException is thrown if it is not a valid XML name and check_well_formedness is True.
The document type public ID or None. On setting if the value is not None it is converted to a Unicode string; a ecoxipy.XMLWellFormednessException is thrown if it is not a valid doctype public ID and check_well_formedness is True.
The document type system ID or None. On setting if the value is not None it is converted to a Unicode string; a ecoxipy.XMLWellFormednessException is thrown if it is not a valid doctype system ID and check_well_formedness is True.
Represents a XML element. It inherits from ContainerNode and NamespaceNameMixin.
Parameters: |
|
---|---|
Raises ecoxipy.XMLWellFormednessException: | |
If check_well_formedness is True and the name is not a valid XML name. |
Creates an element and converts parameters to appropriate types.
Parameters: |
|
---|---|
Returns: | The created element. |
Return type: | |
Raises ecoxipy.XMLWellFormednessException: | |
If the name is not a valid XML name. |
An iterator over all namespace prefixes defined in the element and its parents. Duplicate values may be retrieved.
Calculates the element the namespace prefix is defined in, this is None if the prefix is not defined.
Calculates the namespace URI for the prefix, this is False if the prefix is not defined..
The name of the element. On setting the value is converted to an Unicode string; a ecoxipy.XMLWellFormednessException is thrown if it is not a valid XML name and check_well_formedness is True.
An Attributes instance containing the element’s attributes.
Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.
Represents an item of an Element‘s Attributes. It inherits from NamespaceNameMixin and should not be instantiated on itself, rather use Attributes.create_attribute().
The parent Attributes.
The attribute’s name. On setting the value is converted to an Unicode string, if there is already another attribute with the same name on the parent Attributes instance a KeyError is raised.
The attribute’s value.
This mapping, containing Attribute instances identified by their names, represents attributes of an Element. It should not be instantiated on itself.
Create a new Attribute as part of the instance.
Parameters: |
|
---|---|
Returns: | the created attribute |
Return type: | |
Raises KeyError: | |
If an attribute with name already exists in the instance. |
Add an attribute to the instance. If the attribute is contained in an Attributes instance it is first removed from that.
Parameters: | attribute (Attribute) – the attribute to add |
---|---|
Raises: |
|
A ContentNode representing a node of text.
Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.
A ContentNode representing a comment node.
Raises ecoxipy.XMLWellFormednessException: | |
---|---|
If check_well_formedness is True and content is not valid. |
Creates a comment node.
Parameters: | content – The content of the comment. This will be converted to an Unicode string. |
---|---|
Returns: | The created commment node. |
Return type: | Comment |
Raises ecoxipy.XMLWellFormednessException: | |
If content is not valid. |
Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.
The node content. On setting the value is converted to an Unicode string.
A ContentNode representing a processing instruction.
Parameters: | |
---|---|
Raises ecoxipy.XMLWellFormednessException: | |
If check_well_formedness is True and either the target or the content are not valid. |
Creates a processing instruction node and converts the parameters to appropriate types.
Parameters: | |
---|---|
Returns: | The created processing instruction. |
Return type: | |
Raises ecoxipy.XMLWellFormednessException: | |
If either the target or the content are not valid. |
The processing instruction target.
The node content. On setting the value is converted to an Unicode string.
Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.
Base class for XML node objects.
Retrieving the byte string from an instance yields a byte string encoded as UTF-8.
The parent ContainerNode or None if the node has no parent.
Returns an iterator over all ancestors.
Returns an iterator over all preceding siblings.
Returns an iterator over all following siblings.
Returns an iterator over all preceding nodes.
Returns an iterator over all following nodes.
Creates a string containing the XML representation of the node.
Parameters: |
|
---|
Creates SAX events.
Parameters: |
|
---|---|
Returns: | The content handler used. |
Return a deep copy of the XML node, and its descendants if it is a ContainerNode instance.
A XMLNode containing other nodes with sequence semantics.
Parameters: | children (list()) – The nodes contained of in the node. |
---|
Returns an iterator over the children.
Parameters: | reverse – If this is True the children are returned in reverse document order. |
---|---|
Returns: | An iterator over the children. |
Returns an iterator over all descendants.
Parameters: |
|
---|---|
Returns: | An iterator over the descendants. |
Insert child before index.
Remove child.
A XMLNode with content.
Parameters: | content (Unicode string) – Becomes the content attribute. |
---|
Creates an instance of the ContentNode implementation and converts content to an Unicode string.
Parameters: | content – The content of the node. This will be converted to an Unicode string. |
---|---|
Returns: | The created ContentNode implementation instance. |
The node content. On setting the value is converted to an Unicode string.
Contains functionality implementing Namespaces in XML.
The namespace prefix (the part before :) of the node’s name.
The local name (the part after :) of the node’s name.
The namespace URI the namespace_prefix refers to. It is None if there is no namespace prefix and it is False if the prefix lookup failed.