Pyleaf Howto

NOTE Some contents on this page could be superseded. Please refer to examples page for current code.

Pyleaf

Pyleaf is a Python package able to bind Python functions to an LGL graph. Pyleaf needs to know the graph (just a string in LGL code) and where to find the functions mentioned in the graph as node labels. The binding between the nodes and the corresponding Python functions is indeed performed simply by searching for LGL nodes and Python functions having the same name. With this information, pyleaf can build a leaf.prj.project Python object, which is the main interface to all Leaf's features. Supposing that a str object called protocolmap contains valid LGL code and the functions to bind are contained in a file named example.py, the following code builds a Leaf project:

from leaf.prj import project
pr = project('example', 'protocolmap')

Pyleaf must be able to import the example.py file, see the examples.

The project and protocol objects

A Leaf project can include a number of protocols. This is mainly due to the alternative protocols feature that is still under development. An usual pyeaf project at the moment will include only one unnamed protocol, that can be accessed this way:

pt = prj.protocols['']

The project object manages higher level tasks. From the user's point of view, it is responsible to retrieve user's source code and pass it to the protocols. The protocol object, instead, is what the user interacts with when the code has already been loaded. This means that most pyleaf actions are performed through a protocol object, but changes in the user's source code are to be taken care of through the project object.

Requesting a resource

The protocol object is used to request a resource. If you have an LGL graph like the following:

       / func2
func1 <
       \ func3
;

You can ask to get the output of the node func2 this way:

>>> x = pt.provide(func2)

Pyleaf will look up the resource in its internal database to check if it is already available. If not, it will produce it. To produce it, it needs the output of func1, so it will recursively search for it. If also the output of func1 is not available, it will be built on the fly, since it does not need any input. Now func2 can be run too. At this point, if you try the following:

>>> y = pt.provide(func3)

Pyleaf will search for the output of func3. Since it is not available, it will try to build it. Since this time the output of func1 is available, pyleaf can now run func3 directly. From now on, all resources in the protocol are available and returned immediately upon request.

Resource management, session recovery and protocol consistency

Pyleaf automatically stores all produced resources internally in primary memory and dumps them to the disk as soon as they are built the first time. The provide method first checks if a resource is available in primary memory; if not, it searches for a dump on the disk; if there is no previous dump the resource is built on the fly applying the process seen in the previous section. The newly available resource is immediately dumped to the disk.
Leaf tries to keep source code consistent with the stored resources and across nodes. To this aim, source code is stored together with the resources it produces. If the code is changed, Leaf will detect the change by comparing the working code with the stored code as soon as the method update is called, automatically untrust-ing the involved nodes. The untrust method clears the resources produced by a node and all its descendants. On the other hand, the trust method can be used to force Leaf to accept a given resource as the output of a node.
When a project object is built, the corresponding dumps are searched for on the disk and automatically loaded, including resources and source code that produced them. This way, Leaf will be able to persistently keep track of code changes across different working sessions.

Publishing protocols

The publish method of the protocol class produces a full hypertextual report of the protocol. It includes some general statistics about the protocol, a visualization of the graph, details and source code for each node. The graph (or "protocol map") is also hypertextual, with each node containing links to the output files it produces on the disk. The published protocol is the current implementation of our concept of bioinformatic protocol.
The hypertext is created as an HTML file in the current directory. A css style-sheet and a Leaf logo is also automatically added to the same directory. Future versions of pyleaf will allow to export such files and all the linked output files to a given directory.

Automatic time-space complexity monitoring

Pyleaf keeps track of the time needed to process each node and of the space required by each node producing files (the ones identified by the [F] flag in LGL). Such statistics are reported in the published protocol and during computation.

protocol methods

The following is a summary of the main operations that you can perform an a protocol. It is extracted from the online help of pyleaf.

clear	Clears a resource from RAM.
clearall	Clears all resources from RAM.
dumpOff	Switches dumping OFF.
dumpOn	Switches dumping ON.
export	Exports the graph to a pdf file, including docstrings.
getinputs	Collects all input resources that are input to the given node and returns a copy of them in a list.
list	Lists the state of all resources.
provide	Provides a resource. The resource is returned if available, loaded from disk if dumped, built on the fly otherwise.
publish	Exports a full report of the protocol to HTML.
rebuild	Clears a resource, then provides it.
run	Provides all leaf (final) resources.
trust	Assign a resource to a node without invalidating dependent resources.
undump	Clears a dumped resource from the disk.
undumpall	Clears all dumped resources from the disk.
untrust	Clears a resource and all its dependent.