Usage¶
DoJSON is a simple Pythonic JSON to JSON converter.
The main goal of this package is to help with managing a set of rules for manipulation of Python dictionaries with focus on JSON serialization. Each rule is associated with regular expression and key. The regular expression has to match a key in the source mapping and produces a new value that is added to the output mapping under the new key.
Initialization¶
First create an Overdo object that is holding the index with rules.
>>> import dojson
>>> simple = dojson.Overdo()
Next step is to create rules that will manupulate a source object.
>>> @simple.over('first', '^.*st$')
... def first(self, key, value):
... return value + 1
>>> @simple.over('second', '^.*nd$')
... def second(self, key, value):
... return value + 2
And now we can try to match the source object and produce new data.
>>> data = simple.do({'1st': 1, '2nd': 2})
>>> assert 2 == data['first']
>>> assert 4 == data['second']
Command line interface¶
Command line interface script is installed as dojson
.
The easiest way to get started by applying already registered rule to a JSON data.
{"245__": {"a": "Test title"}}
DoJSON comes with set of rules for processing MARC21 fields.
$ echo '{"245__": {"a": "Test title"}}' | dojson do marc21
{"title_statement": {"title": "Test title"}}
Sometimes one can get input with fields that does not match any rule.
To get such a list of fields one can use the missing
command.
$ echo '{"999__": {"a": "Test title"}}' | dojson missing marc21
999__
The usual problem comes with reading different file formats such as XML.
<?xml version='1.0' encoding='UTF-8'?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Test title</subfield>
</datafield>
</record>
</collection>
You can specify regitered loader using -l <NAME>
argument. Save the above
example as example.xml
and check following command.
$ dojson -i example.xml -l marcxml do marc21
{"title_statement": {"title": "Test title"}}
In similar way it is possible to specify different output serializer (-d
).
$ echo '{"title_statement": {"title": "Test title"}}' | \
dojson -d marcxml do marc21
<?xml version='1.0' encoding='UTF-8'?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Test title</subfield>
</datafield>
</record>
</collection>
Command chaining¶
This makes JSON manipulation even easier. For first example see schema
command that accept string argument containing URL of JSON-Schema that
should be added to $schema
field.
$ dojson -i example.xml -l marcxml do marc21 \
schema http://example.org/schema/marc21.json
..."schema": "http://example.org/schema/marc21.json"...
Second example shows easy verification that rules produce an identity function.
$ dojson -l marcxml -d marcxml do marc21 do to_marc21 < example.xml | \
diff - example.xml
Extensibility¶
New commands, loaders, dumpers, or rules can be provided via entry points.
dojson.cli
commands that return a processor acception an iterator;dojson.cli.load
functions expecting a stream and returning Python dict or iterator;dojson.cli.dump
functions expecting a Python object and returningstr
;dojson.cli.rule
instances ofdojson.overdo.Overdo
with loaded rules.