Quickstart guide

This text is intended to help people already familiar with Pratt parsers to start using this library to write them. If you want a full introduction to the concept(s) behind this kind of parser, go to the next section, the tutorial.

Basic Code Layout

It’s suggested that your code follows this structure:

  1. Binding power definition of all symbols with non-zero binding power, sorted by binding power.
  2. All uses of led- or nud-defining helpers (such as operator definitions).
  3. Definition of all remaining symbols with no .
  4. Closing the symbol table such that later misspellings of symbol ids are caught as error.
  5. Definition of all handwritten actions in whatever order suits you.

It is also suggested that you encapsulate definitions for a single instance of a parser inside a single closure (see below for an example). This way you can be sure that each parser instance has its own state and doesn’t interfere with other parsers.

Usually, you’ll want to put all binding power settings into a single place, using the bp_spec helper. For each level of precedence, you write a single line with all symbols of that binding power, and finally the binding power. You can write two lines with the same binding power, and the lines don’t need to be ordered by binding power. PrattParse generally doesn’t place any restrictions on symbol ids, but functions like these only work correctly with ids that don’t contain whitespace.

Example

Take the following parser as a simple example. It handles integer literals, identifiers, the four arithmetic operators, parentheses and single-argument function calls:

def Parser():
    p = parser.ParserSkeleton()
    h = symbol.Helpers(p)

    h.bp_spec('''
        LPAREN 100
        MUL DIV 20
        ADD SUB 10
    ''')
    h.literals('NAME INT')
    h.infixes('MUL DIV ADD SUB')
    h.symbol('RPAREN')
    p.symbols.close()

    # custom actions
    @h.nud_of('LPAREN')
    def p_parens(self):
        inner = p.expression()
        p.advance('RPAREN')
        return inner

    @h.led_of('LPAREN')
    def p_call(self, left):
        self.first = left
        self.second = p.expression()
        p.advance('RPAREN')
        return inner

    # return callable that's a parser for the "start" of the grammar
    return p.expression

Hints And Limitations

The two primary exports of the library are the ParserSkeleton and the Helpers. The former handles token consumption and exposes a minimal interface to the symbol table (the symbol(id, [bp]) function). The latter provides all kinds of convenience functions for defining symbols, their binding power, and actions.

For {pre, in, post}-fix operators, there are helpers for defining them, as well as pluralized versions that define several at once. There’s also a version with a _r suffix for infix operators which makes the operator right-associative. All of them store the (first) operand in a member called .first, and the infix operators store their second operand in a member called .second.

Also, there’s a myriad of optional arguments for the parser skeleton and the helper. For the parser, the most frequently needed options are:

  • symcls, the class that is used as base class for all symbols. There are few limitations on it, as long as its instances have an .id and a numeric .lbp member and have .led(left) and .nud() methods. It’s suggested that the default .nud() and .led() raise NotImplementedError. The parser treats that as “unexpected token” and gives an appropriate error message.
  • tok2sym, a function to convert tokens to symbols. The default value just returns its argument, which is fine if your tokenizer produces instances of symcls.

The keyword arguments are:

  • tr, a mapping from symbol ids to user-friendly descriptions, e.g. “ID” -> “identifier”.
  • max_la, which limits lookahead.
  • token_lineno_func and token_col_func which, if provided, are used to get a line number and column from symbols for error messages. Use operator.itemgetter if you already have them in symbol attributes.
  • debug, which enables a few checks for likely coding errors if true and by default. For example, it causes errors on symbol(id, bp) calls with a bp given that would be ignored because it’s smaller than the old value.

As a rule of thumb for what’s a helper method and what’s a parser method:

  • Token consumption is part of the parser
  • Symbol-related definitions are part of the helper

Of course, there are a few exceptions, such as p.symbols.close(). This may change later, but with the current architecture, it is a necessary and perhaps lesser evil.

Statements

This is a feature many if not most parsers will find useful, but it’s still an extension and thus in a seperate module. You may know the technique of statement denotations, or std for short, from Crockford’s article. If you don’t, here’s a short summary:

  • Each symbol gets a std method, which is basically built like nud but for symbols that indicate the start of statements (e.g. return in many languages).
  • There’s a new function, statement, which looks at the current token to see if it’s the start of a statement (has a std). If that’s not the case, it just delegates to expression; otherwise, std is used.
  • Afterwards, a statement terminator (in the case of Crockford’s “Simplified Javascript”, a semicolon) is expected.

PrattParse extends this idea slightly by generalizing the statement terminator part. Usually, the grammar requires some kind of statement terminator after most statements and after expressions used as statements. Our statement function automatically checks for one based on the boolean need_terminator attribute of the result (regardless of whether it came from expression or std).

To add statement handling, you can simply change the imports to from prattparse.statement import ParserSkeleton, Helpers. Then just add the id of the statement terminator symbol as first argument to the parser skeleton instanciation. The parser and helper work exactly like the basic ones, they just add a few methods. The parser simply gets the described statement function. The helper has a few new methods:

  • std_of, a decorator in the vein of nud_of and led_of
  • no_terminator to declaratively set the need_terminator attribute of several symbols to false.
  • single_expr_stmt and its plural, single_expr_stmt, define actions like the operator helpers. They add std actions that simply parse an expression, store it in .first and return the statement symbol - suited to statements like return.

Table Of Contents

Previous topic

Welcome to PrattParse’s documentation!

Next topic

A Brief Tutorial

This Page