8. Debugging and Error Codes

8.1. Regular Expression Parsing

It is often difficult for those who are occasional users of regular expessions to be sure about the meaning of more complex expressions. To help understand how jsre interprets an expression the module contains a simple print-to-console function which prints the tree resulting from parsing a given expression.

To obtain the parse printout use any of the functions that accept a flag and add the XDUMPPARSE flag. For example, the following is a simplified (not recommended) regular expression for IP addresses:

>>> regex = jsre.compile("(\d{1,3}\.){3}[0-9]{1,3}", flags=jsre.XDUMPPARSE)

        ********** Parse Tree **********
        0:group         (\d{1,3}\.){3}[0-9]{1,3}
          11:repeat         {3,3}
            0:group
              3:repeat         {1,3}
                1:property      {general_category=decimal_number}
              9:character     \.        (  2e,  2e)
          19:repeat         {1,3}
            14:class         [0-9]
              15:character     0-9      (  30,  39)
        ********************************

Note:

1. Each entry starts with the character index in the expression and the type of component (e.g. the first repeat above is at character 11); indentation represents nesting.

2. All repeats are expanded into their full representation (for example {3} is shown as {3,3}).

3. The characters included in each character class are listed below the class; the hexadecimal value (range) of characters (character ranges) is also given.

8.2. Virtual Machine Error Codes

Matching is carried out in a Python extension module jsre which is a virtual machine. Errors returned fom the VM are integer codes. For example:

SystemError: Error from jsvm.findMatch(), code: 3

Codes 1-9 are reserved for runtime errors (i.e. during matching) and errors 10-99 are errors encountered during program loading. As far as possible the VM loader tests for errors during the loading to minimise the need for error checking during matching; most errors should therefore be caught by the compiler and a suitable message given.

The following Virtual Machine exception codes may be seen by a user:

Code Cause
2 Failed to allocate space for the runtime object.
3 Thread Heap Overflow.
4 Failed to allocate additional thread heap.
11 Failed to allocate memory for the VM.

Any other error codes reported by the VM are likely to be the result of compiler errors; I would appreciate a report.

The allocation failures (2, 4, 11) indicate memory exhaustion. Remember that Python is usually limited to 2GB memory; the original cause is unlikely to be jsre since its memory footprint is limited to below 10MB. The only jsre event that could perhaps exceed the memory available is attempting to build a very large result set with a single matching attempt. This would usually be a bug in the regular expression used, or the use of an expression which is not well suited to the application.

Code 3 (Thread heap overflow) indicates that the expression has generated too many alternative options (valid prefixes) at some point in the input stream.