Tutorial I: A guided JSON parser:¶
The JSON spec here: http://json.org could be easily describe using pyrser.
1- Starting block¶
To describe a file format you just need to write a class that inherits from pyrser.grammar.Grammar
:
from pyrser import grammar
class JSON(grammar.Grammar):
"""Our future JSON parser"""
pass
This empty class is just a container for the BNF description of our file format. Pyrser uses just 2 class variable to hold BNF:
- grammar: a docstring containing the BNF description.
- entry: the name of the rule to use as entry point.
2- Translate BNF¶
In the JSON spec, the first rule object
is describe as:
object
{}
{ members }
This describes a rule name object
as members
surrounded by braces. members
could be empty.
That BNF could be literaly translate as:
from pyrser import grammar
class JSON(grammar.Grammar):
"""Our JSON parser"""
entry = "object"
grammar = """
object = [
'{' '}'
| '{' members '}'
]
"""
But Pyrser BNF syntax provides the repeater ?
that allows you to describe object
in a more concise way.
grammar = """
object = [ '{' members? '}' ]
"""
Using Writing a BNF grammar, we translate completly the grammar.
3- BNF of list¶
In the JSON spec, a common pattern is used to describe list of items. example:
members
pair
pair , members
elements
value
value , elements
This kind of parser uses right recursivity to create list of item. Pyrser parsing engine uses PEG (Parsing Expression Grammar) mechanism.
It’s better to use repeater +
or *
to describe the list.
4- Basic JSON Parser¶
With these advices, we could translate all the BNF:
from pyrser import grammar
from pyrser.directives import ignore
class JSON(grammar.Grammar):
"""Our JSON parser"""
entry = "json"
grammar = """
json = [ object eof ]
object = [ '{' members? '}' ]
members = [ pair [',' pair]* ]
pair = [ string ':' value ]
value =
[
string
| number
| object
| array
| "true"
| "false"
| "null"
]
array = [ '[' elements? ']' ]
elements = [ value [',' value]* ]
number = [ @ignore("null") [int frac? exp?] ]
int = [ '-'?
[
digit1_9s
| digit
]
]
frac = [ '.' digits ]
exp = [ e digits ]
digit = [ '0'..'9' ]
digit1_9 = [ '1'..'9' ]
digits = [ digit+ ]
digit1_9s = [ digit1_9 digits ]
e = [ ['e'|'E'] ['+'|'-']? ]
"""
note 1: We could notice the use of @ignore("null")
in the rule number
.
This directive
allow you to change ignore convention
.
See Setting Directives: Module directives for more informations about directives.
note 2: We don’t provide the string
and eof
rules because these rules are default rules provided by inheritance from the grammar Grammar
.
See Base of all parser for more informations about what is provided by default and how composition work.
5- Building an AST¶
The aim of parsing is to translate a textual representation of information into data structures representation or AST(for Abstract Syntax Tree). A tree constructs to represent all abstractions provided by the syntax. Here we need to translate JSON into python objects. To do this, we want to fetch data during the parsing process and create objects on the fly by calling some python chunks of code.
Pyrser provides to us two mechanisms:
- hooks for event handling
- nodes for data handling
Let’s focus on the number
rule. We want to capture the number and convert it in float.
nodes¶
To capture the result of a rule just suffix
it by ‘:’ and names it:
"""
...
number = [ @ignore("null") [int frac? exp?]:n ]
...
"""
This will create a new node named n
.
hooks¶
To do something on n
just send it thru a hook named is_num
to some python code.
Just call a hook after reading string:
"""
...
number = [ @ignore("null") [int frac? exp?]:n #is_num(n) ]
...
"""
By default is_num
is an unknown hook. Let’s declare it with the following syntax:
from pyrser import meta
@meta.hook(JSON)
def is_num(self, arg):
print(self.value(arg))
return True
note: A hook is just a function with a special decorator:
- The function took at least one parameter
self
. This is the parser instance (here your JSON instance).arg
is the capturing node (an instance ofpyrser.parsing.node.Node
).
We could fetch the captured text (parsed by [int frac? exp?]
) with a call to self.value
on the arg
.
note: A hook must return True if the parsing must continue. You could stop parsing by returning False (this return provoking a parse error).
See Setting Hooks: Module hooks for more informations about hooks.
See Building AST (Abstract Syntax Tree): Module node for more informations about nodes.
return values¶
Well, we could capture data from the input and do something on it. But how returned to the caller
our results?
For this, we must use the special node named _
. Indeed, _
is bound to the rule resulting node.
So, we must patch our number
rule and the is_num
hook like this:
...
"""
...
number = [ @ignore("null") [int frac? exp?]:n #is_num(_, n) ]
...
"""
...
_
is received by the is_num
function as parameter. You can’t modify it directly.
To return something with it you must create an arbitrary attribute to carry the output:
from pyrser import meta
@meta.hook(JSON)
def is_num(self, ast, arg):
# node is arbitrary
ast.node = float(self.value(arg))
return True
note: The float
constructor interpret directly self.value(arg)
like 1.0
or -2e+2
to create a float object.
We could proceed like this for all trivial values.
Sometime, we only want to transfert the result of a subrule as the result of the current rule. For this, just use the bind
operator :>
that connect the output to an existing node:
...
"""
...
value =
[
[number | object | array]:>_
...
]
...
"""
...
handling arrays¶
Let’s focus on a more complex case, the array
rule:
array = [ '[' elements? ']' ]
elements = [ value [',' value]* ]
These kind of rules are not really optimized for a PEG parser. It’s better to have, in the same rule,
the resulting node (array
) and the list of items (list of value
). We could merge this two rules into
one:
array = [ '[' [value [',' value]* ]? ']' ]
In this form, it’s easier to identify where to put a hook to create a python array, and where to put a hook to add item into this array:
array = [ '[' #is_array(_) [value:v #add_item(_, v) [',' value:v #add_item(_, v) ]* ]? ']' ]
With the following hooks:
@meta.hook(JSON)
def is_array(self, ast):
ast.node = []
return True
@meta.hook(JSON)
def add_item(self, ast, item):
ast.node.append(item.node)
return True
We could proceed in the same way for the rule object
.
6- Final JSON parser¶
A complete grammar for a JSON parser looks like this:
from pyrser import grammar, meta
from pyrser.directives import ignore
class JSON(grammar.Grammar):
"""Pyrser JSON parser"""
entry = "json"
grammar = """
json =[ object:>_ eof ]
object =
[
'{' #is_dict(_) [pair:p #add_kv(_, p) [',' pair:p #add_kv(_, p) ]*]? '}'
]
pair = [ string:s ':' value:v #is_pair(_, s, v) ]
value =
[
[number | object | array]:>_
| [
string:s #is_str(_, s)
| "true":t #is_bool(_, t)
| "false":f #is_bool(_, f)
| "null" #is_none(_)
]
]
array =
[
'[' #is_array(_) [value:v #add_item(_, v) [',' value:v #add_item(_, v)] *]? ']'
]
number = [ @ignore("null") [int frac? exp?]:n #is_num(_, n) ]
int =
[
'-'?
[
digit1_9s
| digit
]
]
frac = [ '.' digits ]
exp = [ e digits ]
digit = [ '0'..'9' ]
digit1_9 = [ '1'..'9' ]
digits = [ digit+ ]
digit1_9s = [ digit1_9 digits]
e = [ ['e'|'E'] ['+'|'-']? ]
"""
@meta.hook(JSON)
def is_num(self, ast, n):
ast.node = float(self.value(n))
return True
@meta.hook(JSON)
def is_str(self, ast, s):
ast.node = self.value(s).strip('"')
return True
@meta.hook(JSON)
def is_bool(self, ast, b):
bval = self.value(b)
if bval == "true":
ast.node = True
if bval == "false":
ast.node = False
return True
@meta.hook(JSON)
def is_none(self, ast):
ast.node = None
return True
@meta.hook(JSON)
def is_pair(self, ast, s, v):
ast.node = (self.value(s).strip('"'), v.node)
return True
@meta.hook(JSON)
def is_array(self, ast):
ast.node = []
return True
@meta.hook(JSON)
def add_item(self, ast, item):
ast.node.append(item.node)
return True
@meta.hook(JSON)
def is_dict(self, ast):
ast.node = {}
return True
@meta.hook(JSON)
def add_kv(self, ast, item):
ast.node[item.node[0]] = item.node[1]
return True
7- Parser in action¶
Using the JSON class is really easy.
Instanciate it and use the method parse
(or parse_file
) to parse a content:
json = JSON()
res = json.parse("""
{
"test" : 12,
"puf" : [1, 2, 3]
}
""")
if res.node['puf'][1] == 2:
print("OK")
You could also put all your grammar into a BNF file (here json.bnf
) use the from_file
function to create the JSON class:
import pyrser.grammar
JSON = grammar.from_file("json.bnf")
See Grammar Base Class: Module grammar for more informations about way of creating grammar.
See Error handling: Module error if something goes wrong in your grammar.