Module pyparsing :: Class ParserElement
[frames] | no frames]

Class ParserElement

source code

object --+
         |
        ParserElement
Known Subclasses:

Abstract base level parser element class.

Instance Methods
 
__init__(self, savelist=False)
x.__init__(...) initializes x; see help(type(x)) for signature
source code
 
copy(self)
Make a copy of this ParserElement.
source code
 
setName(self, name)
Define name for this expression, makes debugging and exception messages clearer.
source code
 
setResultsName(self, name, listAllMatches=False)
Define name for referencing matching tokens as a nested attribute of the returned parse results.
source code
 
setBreak(self, breakFlag=True)
Method to invoke the Python pdb debugger when this element is about to be parsed.
source code
 
setParseAction(self, *fns, **kwargs)
Define one or more actions to perform when successfully matching parse element definition.
source code
 
addParseAction(self, *fns, **kwargs)
Add one or more parse actions to expression's list of parse actions.
source code
 
addCondition(self, *fns, **kwargs)
Add a boolean predicate function to expression's list of parse actions.
source code
 
setFailAction(self, fn)
Define action to perform if parsing fails at this expression.
source code
 
preParse(self, instring, loc) source code
 
parseImpl(self, instring, loc, doActions=True) source code
 
postParse(self, instring, loc, tokenlist) source code
 
tryParse(self, instring, loc) source code
 
canParseNext(self, instring, loc) source code
 
parseString(self, instring, parseAll=False)
Execute the parse expression with the given string.
source code
 
scanString(self, instring, maxMatches=2147483647, overlap=False)
Scan the input string for expression matches.
source code
 
transformString(self, instring)
Extension to scanString, to modify matching text with modified tokens that may be returned from a parse action.
source code
 
searchString(self, instring, maxMatches=2147483647)
Another extension to scanString, simplifying the access to the tokens found to match the given parse expression.
source code
 
split(self, instring, maxsplit=2147483647, includeSeparators=False)
Generator method to split a string using the given expression as a separator.
source code
 
__add__(self, other)
Implementation of + operator - returns And.
source code
 
__radd__(self, other)
Implementation of + operator when left operand is not a ParserElement
source code
 
__sub__(self, other)
Implementation of - operator, returns And with error stop
source code
 
__rsub__(self, other)
Implementation of - operator when left operand is not a ParserElement
source code
 
__mul__(self, other)
Implementation of * operator, allows use of expr * 3 in place of expr + expr + expr.
source code
 
__rmul__(self, other) source code
 
__or__(self, other)
Implementation of | operator - returns MatchFirst
source code
 
__ror__(self, other)
Implementation of | operator when left operand is not a ParserElement
source code
 
__xor__(self, other)
Implementation of ^ operator - returns Or
source code
 
__rxor__(self, other)
Implementation of ^ operator when left operand is not a ParserElement
source code
 
__and__(self, other)
Implementation of & operator - returns Each
source code
 
__rand__(self, other)
Implementation of & operator when left operand is not a ParserElement
source code
 
__invert__(self)
Implementation of ~ operator - returns NotAny
source code
 
__call__(self, name=None)
Shortcut for setResultsName, with listAllMatches=False.
source code
 
suppress(self)
Suppresses the output of this ParserElement; useful to keep punctuation from cluttering up returned output.
source code
 
leaveWhitespace(self)
Disables the skipping of whitespace before matching the characters in the ParserElement's defined pattern.
source code
 
setWhitespaceChars(self, chars)
Overrides the default whitespace chars
source code
 
parseWithTabs(self)
Overrides default behavior to expand <TAB>s to spaces before parsing the input string.
source code
 
ignore(self, other)
Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.
source code
 
setDebugActions(self, startAction, successAction, exceptionAction)
Enable display of debugging messages while doing pattern matching.
source code
 
setDebug(self, flag=True)
Enable display of debugging messages while doing pattern matching.
source code
 
__str__(self)
str(x)
source code
 
__repr__(self)
repr(x)
source code
 
streamline(self) source code
 
checkRecursion(self, parseElementList) source code
 
validate(self, validateTrace=[])
Check defined expressions for valid structure, check for infinite recursive definitions.
source code
 
parseFile(self, file_or_filename, parseAll=False)
Execute the parse expression on the given file or filename.
source code
 
__eq__(self, other) source code
 
__ne__(self, other) source code
 
__hash__(self)
hash(x)
source code
 
__req__(self, other) source code
 
__rne__(self, other) source code
 
matches(self, testString, parseAll=True)
Method for quick testing of a parser against a test string.
source code
 
runTests(self, tests, parseAll=True, comment='#', fullDump=True, printResults=True, failureTests=False)
Execute the parse expression on a series of test strings, showing each test, the parsed results or where the parse failed.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Static Methods
 
setDefaultWhitespaceChars(chars)
Overrides the default whitespace chars
source code
 
inlineLiteralsUsing(cls)
Set class to be used for inclusion of string literals into a parser.
source code
 
resetCache() source code
 
enablePackrat(cache_size_limit=128)
Enables "packrat" parsing, which adds memoizing to the parsing logic.
source code
Class Variables
  DEFAULT_WHITE_CHARS = ' \n\t\r'
  verbose_stacktrace = False
  packrat_cache = {}
  packrat_cache_lock = <_RLock owner=None count=0>
  packrat_cache_stats = [0, 0]
Properties

Inherited from object: __class__

Method Details

setDefaultWhitespaceChars(chars)
Static Method

source code 

Overrides the default whitespace chars

Example:

   # default whitespace chars are space, <TAB> and newline
   OneOrMore(Word(alphas)).parseString("abc def\nghi jkl")  # -> ['abc', 'def', 'ghi', 'jkl']
   
   # change to just treat newline as significant
   ParserElement.setDefaultWhitespaceChars(" \t")
   OneOrMore(Word(alphas)).parseString("abc def\nghi jkl")  # -> ['abc', 'def']

inlineLiteralsUsing(cls)
Static Method

source code 

Set class to be used for inclusion of string literals into a parser.

Example:

   # default literal class used is Literal
   integer = Word(nums)
   date_str = integer("year") + '/' + integer("month") + '/' + integer("day")           

   date_str.parseString("1999/12/31")  # -> ['1999', '/', '12', '/', '31']


   # change to Suppress
   ParserElement.inlineLiteralsUsing(Suppress)
   date_str = integer("year") + '/' + integer("month") + '/' + integer("day")           

   date_str.parseString("1999/12/31")  # -> ['1999', '12', '31']

__init__(self, savelist=False)
(Constructor)

source code 

x.__init__(...) initializes x; see help(type(x)) for signature

Overrides: object.__init__
(inherited documentation)

copy(self)

source code 

Make a copy of this ParserElement. Useful for defining different parse actions for the same parsing pattern, using copies of the original parse element.

Example:

   integer = Word(nums).setParseAction(lambda toks: int(toks[0]))
   integerK = integer.copy().addParseAction(lambda toks: toks[0]*1024) + Suppress("K")
   integerM = integer.copy().addParseAction(lambda toks: toks[0]*1024*1024) + Suppress("M")
   
   print(OneOrMore(integerK | integerM | integer).parseString("5K 100 640K 256M"))

prints:

   [5120, 100, 655360, 268435456]

Equivalent form of expr.copy() is just expr():

   integerM = integer().addParseAction(lambda toks: toks[0]*1024*1024) + Suppress("M")

setName(self, name)

source code 

Define name for this expression, makes debugging and exception messages clearer.

Example:

   Word(nums).parseString("ABC")  # -> Exception: Expected W:(0123...) (at char 0), (line:1, col:1)
   Word(nums).setName("integer").parseString("ABC")  # -> Exception: Expected integer (at char 0), (line:1, col:1)

setResultsName(self, name, listAllMatches=False)

source code 

Define name for referencing matching tokens as a nested attribute of the returned parse results. NOTE: this returns a *copy* of the original ParserElement object; this is so that the client can define a basic element, such as an integer, and reference it in multiple places with different names.

You can also set results names using the abbreviated syntax, expr("name") in place of expr.setResultsName("name") - see __call__.

Example:

   date_str = (integer.setResultsName("year") + '/' 
               + integer.setResultsName("month") + '/' 
               + integer.setResultsName("day"))

   # equivalent form:
   date_str = integer("year") + '/' + integer("month") + '/' + integer("day")

setBreak(self, breakFlag=True)

source code 

Method to invoke the Python pdb debugger when this element is about to be parsed. Set breakFlag to True to enable, False to disable.

setParseAction(self, *fns, **kwargs)

source code 

Define one or more actions to perform when successfully matching parse element definition. Parse action fn is a callable method with 0-3 arguments, called as fn(s,loc,toks), fn(loc,toks), fn(toks), or just fn(), where:

  • s = the original string being parsed (see note below)
  • loc = the location of the matching substring
  • toks = a list of the matched tokens, packaged as a ParseResults object

If the functions in fns modify the tokens, they can return them as the return value from fn, and the modified list of tokens will replace the original. Otherwise, fn does not need to return any value.

Optional keyword arguments:

  • callDuringTry = (default=False) indicate if parse action should be run during lookaheads and alternate testing

Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process. See parseString for more information on parsing strings containing <TAB>s, and suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column positions within the parsed string.

Example:

   integer = Word(nums)
   date_str = integer + '/' + integer + '/' + integer

   date_str.parseString("1999/12/31")  # -> ['1999', '/', '12', '/', '31']

   # use parse action to convert to ints at parse time
   integer = Word(nums).setParseAction(lambda toks: int(toks[0]))
   date_str = integer + '/' + integer + '/' + integer

   # note that integer fields are now ints, not strings
   date_str.parseString("1999/12/31")  # -> [1999, '/', 12, '/', 31]

addParseAction(self, *fns, **kwargs)

source code 

Add one or more parse actions to expression's list of parse actions. See setParseAction.

See examples in copy.

addCondition(self, *fns, **kwargs)

source code 

Add a boolean predicate function to expression's list of parse actions. See setParseAction for function call signatures. Unlike setParseAction, functions passed to addCondition need to return boolean success/fail of the condition.

Optional keyword arguments:

  • message = define a custom message to be used in the raised exception
  • fatal = if True, will raise ParseFatalException to stop parsing immediately; otherwise will raise ParseException

Example:

   integer = Word(nums).setParseAction(lambda toks: int(toks[0]))
   year_int = integer.copy()
   year_int.addCondition(lambda toks: toks[0] >= 2000, message="Only support years 2000 and later")
   date_str = year_int + '/' + integer + '/' + integer

   result = date_str.parseString("1999/12/31")  # -> Exception: Only support years 2000 and later (at char 0), (line:1, col:1)

setFailAction(self, fn)

source code 

Define action to perform if parsing fails at this expression. Fail acton fn is a callable function that takes the arguments fn(s,loc,expr,err) where:

  • s = string being parsed
  • loc = location where expression match was attempted and failed
  • expr = the parse expression that failed
  • err = the exception thrown

The function returns no value. It may throw ParseFatalException if it is desired to stop parsing immediately.

enablePackrat(cache_size_limit=128)
Static Method

source code 

Enables "packrat" parsing, which adds memoizing to the parsing logic. Repeated parse attempts at the same string location (which happens often in many complex grammars) can immediately return a cached value, instead of re-executing parsing/validating code. Memoizing is done of both valid results and parsing exceptions.

Parameters:

  • cache_size_limit - (default=128) - if an integer value is provided will limit the size of the packrat cache; if None is passed, then the cache size will be unbounded; if 0 is passed, the cache will be effectively disabled.

This speedup may break existing programs that use parse actions that have side-effects. For this reason, packrat parsing is disabled when you first import pyparsing. To activate the packrat feature, your program must call the class method ParserElement.enablePackrat(). If your program uses psyco to "compile as you go", you must call enablePackrat before calling psyco.full(). If you do not do this, Python will crash. For best results, call enablePackrat() immediately after importing pyparsing.

Example:

   import pyparsing
   pyparsing.ParserElement.enablePackrat()

parseString(self, instring, parseAll=False)

source code 

Execute the parse expression with the given string. This is the main interface to the client code, once the complete expression has been built.

If you want the grammar to require that the entire input string be successfully parsed, then set parseAll to True (equivalent to ending the grammar with StringEnd()).

Note: parseString implicitly calls expandtabs() on the input string, in order to report proper column numbers in parse actions. If the input string contains tabs and the grammar uses parse actions that use the loc argument to index into the string being parsed, you can ensure you have a consistent view of the input string by:

  • calling parseWithTabs on your grammar before calling parseString (see parseWithTabs)
  • define your parse action using the full (s,loc,toks) signature, and reference the input string using the parse action's s argument
  • explictly expand the tabs in your input string before calling parseString

Example:

   Word('a').parseString('aaaaabaaa')  # -> ['aaaaa']
   Word('a').parseString('aaaaabaaa', parseAll=True)  # -> Exception: Expected end of text

scanString(self, instring, maxMatches=2147483647, overlap=False)

source code 

Scan the input string for expression matches. Each match will return the matching tokens, start location, and end location. May be called with optional maxMatches argument, to clip scanning after 'n' matches are found. If overlap is specified, then overlapping matches will be reported.

Note that the start and end locations are reported relative to the string being parsed. See parseString for more information on parsing strings with embedded tabs.

Example:

   source = "sldjf123lsdjjkf345sldkjf879lkjsfd987"
   print(source)
   for tokens,start,end in Word(alphas).scanString(source):
       print(' '*start + '^'*(end-start))
       print(' '*start + tokens[0])

prints:

   sldjf123lsdjjkf345sldkjf879lkjsfd987
   ^^^^^
   sldjf
           ^^^^^^^
           lsdjjkf
                     ^^^^^^
                     sldkjf
                              ^^^^^^
                              lkjsfd

transformString(self, instring)

source code 

Extension to scanString, to modify matching text with modified tokens that may be returned from a parse action. To use transformString, define a grammar and attach a parse action to it that modifies the returned token list. Invoking transformString() on a target string will then scan for matches, and replace the matched text patterns according to the logic in the parse action. transformString() returns the resulting transformed string.

Example:

   wd = Word(alphas)
   wd.setParseAction(lambda toks: toks[0].title())
   
   print(wd.transformString("now is the winter of our discontent made glorious summer by this sun of york."))

Prints:

   Now Is The Winter Of Our Discontent Made Glorious Summer By This Sun Of York.

searchString(self, instring, maxMatches=2147483647)

source code 

Another extension to scanString, simplifying the access to the tokens found to match the given parse expression. May be called with optional maxMatches argument, to clip searching after 'n' matches are found.

Example:

   # a capitalized word starts with an uppercase letter, followed by zero or more lowercase letters
   cap_word = Word(alphas.upper(), alphas.lower())
   
   print(cap_word.searchString("More than Iron, more than Lead, more than Gold I need Electricity"))

   # the sum() builtin can be used to merge results into a single ParseResults object
   print(sum(cap_word.searchString("More than Iron, more than Lead, more than Gold I need Electricity")))

prints:

   [['More'], ['Iron'], ['Lead'], ['Gold'], ['I'], ['Electricity']]
   ['More', 'Iron', 'Lead', 'Gold', 'I', 'Electricity']

split(self, instring, maxsplit=2147483647, includeSeparators=False)

source code 

Generator method to split a string using the given expression as a separator. May be called with optional maxsplit argument, to limit the number of splits; and the optional includeSeparators argument (default=False), if the separating matching text should be included in the split results.

Example:

   punc = oneOf(list(".,;:/-!?"))
   print(list(punc.split("This, this?, this sentence, is badly punctuated!")))

prints:

   ['This', ' this', '', ' this sentence', ' is badly punctuated', '']

__add__(self, other)
(Addition operator)

source code 

Implementation of + operator - returns And. Adding strings to a ParserElement converts them to Literals by default.

Example:

   greet = Word(alphas) + "," + Word(alphas) + "!"
   hello = "Hello, World!"
   print (hello, "->", greet.parseString(hello))

Prints:

   Hello, World! -> ['Hello', ',', 'World', '!']

__mul__(self, other)

source code 

Implementation of * operator, allows use of expr * 3 in place of expr + expr + expr. Expressions may also me multiplied by a 2-integer tuple, similar to {min,max} multipliers in regular expressions. Tuples may also include None as in:

  • expr*(n,None) or expr*(n,) is equivalent to expr*n + ZeroOrMore(expr) (read as "at least n instances of expr")
  • expr*(None,n) is equivalent to expr*(0,n) (read as "0 to n instances of expr")
  • expr*(None,None) is equivalent to ZeroOrMore(expr)
  • expr*(1,None) is equivalent to OneOrMore(expr)

Note that expr*(None,n) does not raise an exception if more than n exprs exist in the input stream; that is, expr*(None,n) does not enforce a maximum number of expr occurrences. If this behavior is desired, then write expr*(None,n) + ~expr

__call__(self, name=None)
(Call operator)

source code 

Shortcut for setResultsName, with listAllMatches=False.

If name is given with a trailing '*' character, then listAllMatches will be passed as True.

If name is omitted, same as calling copy.

Example:

   # these are equivalent
   userdata = Word(alphas).setResultsName("name") + Word(nums+"-").setResultsName("socsecno")
   userdata = Word(alphas)("name") + Word(nums+"-")("socsecno")             

leaveWhitespace(self)

source code 

Disables the skipping of whitespace before matching the characters in the ParserElement's defined pattern. This is normally only used internally by the pyparsing module, but may be needed in some whitespace-sensitive grammars.

parseWithTabs(self)

source code 

Overrides default behavior to expand <TAB>s to spaces before parsing the input string. Must be called before parseString when the input grammar contains elements that match <TAB> characters.

ignore(self, other)

source code 

Define expression to be ignored (e.g., comments) while doing pattern matching; may be called repeatedly, to define multiple comment or other ignorable patterns.

Example:

   patt = OneOrMore(Word(alphas))
   patt.parseString('ablaj /* comment */ lskjd') # -> ['ablaj']
   
   patt.ignore(cStyleComment)
   patt.parseString('ablaj /* comment */ lskjd') # -> ['ablaj', 'lskjd']

setDebug(self, flag=True)

source code 

Enable display of debugging messages while doing pattern matching. Set flag to True to enable, False to disable.

Example:

   wd = Word(alphas).setName("alphaword")
   integer = Word(nums).setName("numword")
   term = wd | integer
   
   # turn on debugging for wd
   wd.setDebug()

   OneOrMore(term).parseString("abc 123 xyz 890")

prints:

   Match alphaword at loc 0(1,1)
   Matched alphaword -> ['abc']
   Match alphaword at loc 3(1,4)
   Exception raised:Expected alphaword (at char 4), (line:1, col:5)
   Match alphaword at loc 7(1,8)
   Matched alphaword -> ['xyz']
   Match alphaword at loc 11(1,12)
   Exception raised:Expected alphaword (at char 12), (line:1, col:13)
   Match alphaword at loc 15(1,16)
   Exception raised:Expected alphaword (at char 15), (line:1, col:16)

The output shown is that produced by the default debug actions - custom debug actions can be specified using setDebugActions. Prior to attempting to match the wd expression, the debugging message "Match <exprname> at loc <n>(<line>,<col>)" is shown. Then if the parse succeeds, a "Matched" message is shown, or an "Exception raised" message is shown. Also note the use of setName to assign a human-readable name to the expression, which makes debugging and exception messages easier to understand - for instance, the default name created for the Word expression without calling setName is "W:(ABCD...)".

__str__(self)
(Informal representation operator)

source code 

str(x)

Overrides: object.__str__
(inherited documentation)

__repr__(self)
(Representation operator)

source code 

repr(x)

Overrides: object.__repr__
(inherited documentation)

parseFile(self, file_or_filename, parseAll=False)

source code 

Execute the parse expression on the given file or filename. If a filename is specified (instead of a file object), the entire file is opened, read, and closed before parsing.

__hash__(self)
(Hashing function)

source code 

hash(x)

Overrides: object.__hash__
(inherited documentation)

matches(self, testString, parseAll=True)

source code 

Method for quick testing of a parser against a test string. Good for simple inline microtests of sub expressions while building up larger parser.

Parameters:

  • testString - to test against this expression for a match
  • parseAll - (default=True) - flag to pass to parseString when running tests

Example:

   expr = Word(nums)
   assert expr.matches("100")

runTests(self, tests, parseAll=True, comment='#', fullDump=True, printResults=True, failureTests=False)

source code 

Execute the parse expression on a series of test strings, showing each test, the parsed results or where the parse failed. Quick and easy way to run a parse expression against a list of sample strings.

Parameters:

  • tests - a list of separate test strings, or a multiline string of test strings
  • parseAll - (default=True) - flag to pass to parseString when running tests
  • comment - (default='#') - expression for indicating embedded comments in the test string; pass None to disable comment filtering
  • fullDump - (default=True) - dump results as list followed by results names in nested outline; if False, only dump nested list
  • printResults - (default=True) prints test output to stdout
  • failureTests - (default=False) indicates if these tests are expected to fail parsing

Returns: a (success, results) tuple, where success indicates that all tests succeeded (or failed if failureTests is True), and the results contain a list of lines of each test's output

Example:

   number_expr = pyparsing_common.number.copy()

   result = number_expr.runTests('''
       # unsigned integer
       100
       # negative integer
       -100
       # float with scientific notation
       6.02e23
       # integer with scientific notation
       1e-12
       ''')
   print("Success" if result[0] else "Failed!")

   result = number_expr.runTests('''
       # stray character
       100Z
       # missing leading digit before '.'
       -.100
       # too many '.'
       3.14.159
       ''', failureTests=True)
   print("Success" if result[0] else "Failed!")

prints:

   # unsigned integer
   100
   [100]

   # negative integer
   -100
   [-100]

   # float with scientific notation
   6.02e23
   [6.02e+23]

   # integer with scientific notation
   1e-12
   [1e-12]

   Success
   
   # stray character
   100Z
      ^
   FAIL: Expected end of text (at char 3), (line:1, col:4)

   # missing leading digit before '.'
   -.100
   ^
   FAIL: Expected {real number with scientific notation | real number | signed integer} (at char 0), (line:1, col:1)

   # too many '.'
   3.14.159
       ^
   FAIL: Expected end of text (at char 4), (line:1, col:5)

   Success

Each test string must be on a single line. If you want to test a string that spans multiple lines, create a test like this:

   expr.runTest(r"this is a test\n of strings that spans \n 3 lines")

(Note that this is a raw string literal, you must include the leading 'r'.)