Table Of Contents

This Page

pyRegurgitator - Tools for analysing python code

Development on github: https://github.com/schettino72/pyRegurgitator/

PyPI: https://pypi.python.org/pypi/pyRegurgitator

Supported versions: Python 3.4 only

Tools

asdlview

asdlview generates a HTML page with python’s ASDL info.

It can also create a JSON file with the ASDL information (used by the astview tool).

$ asdlview python34.asdl > python34.asdl.html

See python 34 ASDL.

astview

astview generates a HTML page with a python module’s AST info.

It can also dumps the AST in text format.

$ astview sample.py > sample.py.html

See sample AST HTML.

py2xml (Experimental)

py2xml convert python code to XML.

All source-code text, including white-space and comments, are preserved. The XML can be converted back to python creating the exact same source.

The design was inspired by srcML. In order to get the original source code you just need to remove all XML tags.

The goal is to use the XML for querying / analysis of the code. And to perform code transformation/refactoring, and then convert it back to python.

Why use XML instead of AST? AST does not preserve comments and source code formatting. There are many available tools to manipulate XML. XML is nice format for query and transformation - just do not use XSLT ;)

Warning

As of release 0.2, py2xml creates a XML preserving all formating. But not much tough was given in creating a XML for easy querying and transformation (it mostly follows the AST nodes). So, expect the XML format to completely change in future releases.

Example

A simple python module...

"""Example module to test pyRegurgitator"""

__version__ = (0, 1, 0)


class Foo:
    def hi(self):
        print('Hi, this is Foo')

    @staticmethod
    def add_4(num):
        return 4 + num


class Bar(Foo):
    def bar(self):
        print('foobar')

Converting to XML:

$ py2xml sample.py > sample.py.xml
<Module><Expr><Str><s>"""Example module to test pyRegurgitator"""</s></Str></Expr>

<Assign><targets><Name ctx="Store" name="__version__">__version__</Name> = </targets>(<Tuple ctx="Load"><Num>0</Num>, <Num>1</Num>, <Num>0</Num></Tuple>)</Assign>


<ClassDef name="Foo">class Foo:<body>
    <FunctionDef name="hi">def hi<arguments>(<arg name="self">self</arg>):</arguments><body>
        <Expr><Call><func><Name ctx="Load" name="print">print</Name></func>(<args><Str><s>'Hi, this is Foo'</s></Str></args>)</Call></Expr></body></FunctionDef>

    <FunctionDef name="add_4"><decorator>@<Name ctx="Load" name="staticmethod">staticmethod</Name></decorator>
    def add_4<arguments>(<arg name="num">num</arg>):</arguments><body>
        <Return>return <BinOp><Num>4</Num><Add> + </Add><Name ctx="Load" name="num">num</Name></BinOp></Return></body></FunctionDef></body></ClassDef>


<ClassDef name="Bar">class Bar<arguments>(<base><Name ctx="Load" name="Foo">Foo</Name></base>)</arguments>:<body>
    <FunctionDef name="bar">def bar<arguments>(<arg name="self">self</arg>):</arguments><body>
        <Expr><Call><func><Name ctx="Load" name="print">print</Name></func>(<args><Str><s>'foobar'</s></Str></args>)</Call></Expr></body></FunctionDef></body></ClassDef>
</Module>

The XML can be converted back to source using the command line:

$ py2xml --reverse sample.py.xml > new_sample.py

Example - query

Example to print out all classes and their method names.

# print all module class and its method names
import xml.etree.ElementTree as ET
doc = ET.parse('_sample/sample.py.xml')

# get all classes from module
for classdef in doc.findall('/ClassDef'):
    print('class ', classdef.get('name'))
    for funcdef in classdef.findall('./body/FunctionDef'):
        print('  ', funcdef.get('name'))

Results in:

class  Foo
   hi
   add_4
class  Bar
   bar

Example - transform

In this example the variable __version__ is programmatically set to a new value. Then the modified code is written back to plain python code.

"""change the value assigned to a variable `__version__`"""

import xml.etree.ElementTree as ET


# parse XML document
doc = ET.parse('_sample/sample.py.xml')

# get node for assignment
assign = doc.find("/Assign/targets/Name[@name='__version__']/../..")
# in an assignment the value is on the second node
assign.remove(assign[1])
new_val = ET.Element('value')
new_val.text = '0, 2, 0'
new_val.tail = ')'
assign.append(new_val)

# save the contents in a new file
source = ET.tostring(doc.getroot(), encoding='unicode', method='text')
print(source)

Results in:

"""Example module to test pyRegurgitator"""

__version__ = (0, 2, 0)


class Foo:
    def hi(self):
        print('Hi, this is Foo')

    @staticmethod
    def add_4(num):
        return 4 + num


class Bar(Foo):
    def bar(self):
        print('foobar')

Note how we just set the text of new content without creating XML nodes for a tuple structure. Since the conversion from XML to python is done just by removing the XML tags, and we won’t do any other transformation we can just insert the new python code text!

Implementation and known limitations

The implementation uses a combination of the python modules ast and tokenize. Together they give all information necessary, code structure and formatting.

The current implementation fails to convert the python code to XML in the following situations:

  • if the source code uses an encoding different than ascii or unicode
  • source contains page-breaks (tokenize module does not handle page-break)
  • expressions wrapped in 2 or more parenthesis. This is tricky to handle correctly, specially because of bugs in the AST column offset information. 2 parenthesis is useless and seldom used in practice, so we just advise users to remove the extra parenthesis.

From the command line you can check if py2xml is capable of a lose-less round-trip (python -> XML -> python) using the –check option. No output means it is OK, otherwise you get an error or a diff.

$ py2xml --check my_module.py