############
XML tutorial
############
.. testsetup:: *
import sys
sys.path.append("..")
body="""
Fido
Paul Martin
Luce Whight
Martin Field
Kim Peng
"""
Let's start by creating a short script :
.. testcode::
import dum
# a sample of XML string to parse
body="""
Fido
Paul Martin
Luce Whight
Martin Field
Kim Peng
"""
# define our parser class and run it on our sample of XML
class Team:
class dum:
name = ""
mascot = ""
member = ""
team=dum.xmls(Team, body)
# inspect content
print("name:", team.name)
print("mascot:", team.mascot)
print("member:", team.member)
If you run this you get :
.. testoutput::
name: The champions
mascot: Fido
member: Paul Martin
We have used :func:`dum.xmls` to parse a string (there is also :func:`dum.xml` to parse a file object)
and we have been able to get information back from attributes and child nodes content.
Projecting collections
----------------------
Fantastic ! But wait ... where are Luce, Martin and Kim ?
By default *dum* only keep the first value, if you want all you will have to tell it . Let's redefine our Team class
.. testcode::
class Team:
class dum:
member = [str]
team=dum.xmls(Team, body)
print(team.member)
Run the script :
.. testoutput::
['Paul Martin', 'Luce Whight', 'Martin Field', 'Kim Peng']
Value's types
-------------
Because the 'founded' attribute is a number, we don't want to have it returned as a string :
.. testcode::
class Team:
class dum:
founded = int
team=dum.xmls(Team, body)
print("It was founded %d years after the beginning of the 21th century"% (team.founded-2000))
.. testoutput::
It was founded 12 years after the beginning of the 21th century
Alternatively you can also define a default value as prototype. This is usefull when the attribute may be ommited in the input file.
.. code-block:: python
class Team:
class dum:
founded = 42
Don't stay alone
----------------
Ok, but now we need member email. For that we will instruct *dum* that member are nodes :
.. testcode::
class Member:
class dum:
name = str, "dum_content"
email = "none"
class Team:
class dum:
member = [Member]
team=dum.xmls(Team, body)
for member in team.member:
print(member.name,":",member.email)
.. testoutput::
Paul Martin : p.martin@sample.net
Luce Whight : none
Martin Field : mfield78@sample.net
Kim Peng : none
Natively dum map textual content of xml elements to the *dum_content* attribute. Here we have said
to *dum* that we want to to go to the *name* attribute instead.
Path globing
------------
More formally, each field from the dum class can be split into 3 segments:
*target = converter[, source]*
* *target* is the name of the python's object attribute
* *converter* is the function used to convert input data from to python attribute value. It may be replaced by a default value which will be used as prototype.
* *source* is the localization of the data in the input document.
The source segment is a string which must conform to a subset of `xPath `_.
Current implementation use `ElementTree syntax `_ for xml and support a partial syntax with json.
This source segment is optional, by default *dum* will look for a node or an attribute with the same name than the target.
The following sample use an xPath expression to collect all the member's emails
.. testcode::
class Team:
class dum:
emails = [str], "member/@email"
team=dum.xmls(Team, body)
print(team.emails)
.. testoutput::
['p.martin@sample.net', 'mfield78@sample.net']
Customized data conversion
--------------------------
When a type default constructor doesn't accept string, you will have to define your own converter. For sample let's say we want to convert the *founded* attribute into a datetime.date object
You can define a function in *dum* class :
.. testcode::
import datetime
class Team:
class dum:
def founded(foundedstr):
return datetime.date(int(foundedstr), 1, 1)
team=dum.xmls(Team, body)
print(team.founded)
.. testoutput::
2012-01-01
Use the :func:`dum.converter` decorator to provide default and/or source
.. testcode::
class Team:
class dum:
@dum.converter(default=datetime.date(1900,1,1))
def founded(foundedstr):
return datetime.date(int(foundedstr), 1, 1)
team=dum.xmls(Team, body)
print(team.founded)
.. testoutput::
2012-01-01
There is also a :func:`dum.lister` decorator for collecting multiple values into one list
Grouping child nodes
--------------------
Because we're all against discrimination, Fido should be a member of the team. The :func:`dum.group` function can put several node types on the same list. Just tell it which nodes to group and how to convert them with named arguments :
.. testcode::
class Team:
class dum:
allmembers = dum.group(member=str, mascot=str)
team=dum.xmls(Team, body)
team.allmembers.sort()
print(", " .join(team.allmembers))
.. testoutput::
Fido, Kim Peng, Luce Whight, Martin Field, Paul Martin
Mascarade
---------
Mascarade are node class wich create an other object : simply define the dum_projection method to return this object
.. testcode::
class Team:
class dum:
name = u""
founded = 0
def dum_projection(self):
return (self.name, self.founded)
team=dum.xmls(Team, body)
print(team)
Here we create a tuple
.. testoutput::
('The champions', 2012)
The method can also be used to do post-parsing initalization, but don't forget to return self.
.. testcode::
class Team:
class dum:
name = ""
founded = 0
def dum_projection(self):
self.title = "%s team, since %s !"%(self.name, self.founded)
return self
team=dum.xmls(Team, body)
print(team.title)
.. testoutput::
The champions team, since 2012 !
Namespaces
----------
.. testsetup:: *
body="""
Fido
Paul Martin
Luce Whight
Martin Field
Kim Peng
"""
XML Namespaces are often used to avoid element name conflicts.
This chapter show how to process a document with a single namespace
using the *__default_namespace__* directive.
.. testcode::
import dum
# a sample of XML string to parse
body="""
Fido
Paul Martin
Luce Whight
Martin Field
Kim Peng
"""
# define __default_namespace__ in our parser class
class Team:
class dum:
__default_namespace__ = "http://example.com/nsp"
name = ""
mascot = ""
member = [""]
team=dum.xmls(Team, body)
# inspect content
print("name:", team.name)
print("mascot:", team.mascot)
print("member:", team.member)
Then you retrieve :
.. testoutput::
name: The champions
mascot: Fido
member: ['Paul Martin', 'Luce Whight', 'Martin Field', 'Kim Peng']
If your document use several namespaces, you can still use *__default_namespace__* for one of
them, but you will have to be explicit with the others.
.. testcode::
# use __namespaces__ in our parser class
class Team:
class dum:
__namespaces__ = {"nsp":"http://example.com/nsp"}
name = "", "name" # attribute without namespace
mascot = "", "nsp:mascot"
member = [""], "nsp:member"
team=dum.xmls(Team, body)
# inspect content
print("name:", team.name)
print("mascot:", team.mascot)
print("member:", team.member)
And again :
.. testoutput::
name: The champions
mascot: Fido
member: ['Paul Martin', 'Luce Whight', 'Martin Field', 'Kim Peng']