Unified execution

Almost each tool in a sequence of steps used for RNA-Seq processing is using idiosyncratic parameters or arguments. This is making the task of a person wanting to use them both prone to errors (as these tools are also not always checking thoroughly that the input and parameters make sense) and to a significant time spent reading scattered sources of information (the documentation for the tools if often insufficient and much of the the knowledge about them is spread throughout internet forums and mailing-lists)

$ alias unifex='python -m railroadtracks.unifex'

Three modes exist: run, version, and activities

$ unifex
usage: unifex.py [-h] {run,version,activities} ...
rnaseq.py: error: too few arguments

A step is then defined by the name of the executable (either as a name to be found in the ${PATH}, or as an absolute path), as well as by the name of a modeling class.

Note

Specifying both the executable and the modeling class is required because a number of executables can perform different tasks/activities.

For example, the command STAR can be used to build an index reference for subsequent alignment, or perform an alignment. Specifying the model associated with the executable helps to remove the ambiguity.

$ unifex run star-index STAR
$ unifex run star-align STAR # astrology anyone ?

Version number

Obtaining the version number is achieved the same way for all steps

$ unifex version bowtie2-build bowtie2
2.1.0
$ unifex version star-align STAR
STAR_2.3.0e_r291
$ unifex version limma-voom R
3.18.3
$ unifex version edger R
3.4.0

Running a step

Running a step is also achieved essentially the same way for all steps. All steps have “sources” (that is input files) and “targets” (that is destination/output files).

$ # bowtie2 (create index)
$ unifex run bowtie2-build bowtie2 \
      -s <name>=<source file(s)> \
      -t <name>=<target file(s)>
$ # STAR (create index)
$ unifex run star-index STAR \
      -s <name>=<source file(s)> \
      -t <name>=<target file(s)>

Note

Whenever the sources and targets expected by a given tool are not specified, the bash command fails and print the list of missing parameters

$ unifex run edger R
The following sources must be defined (and are missing):
- counttable_fn
- sampleinfo_fn
The following targets must be defined (and are missing):
- diffexp_fn

Using a scheduler/Queueing system

The persistent layer can generate the bash commands for running the unified execution. This is making the use of any existing queuing or scheduling system able to take bash script straightforward.

An implementaion using SGE’s qsub is provided with railroadtracks.easy.qsub.

Docstrings

Unified execution layer.

One general way to run things on the command line.

class railroadtracks.unifex.Call(step, assets, parameters)[source]

Unified call, turning a step + assets + parameters into a task.

execute()[source]

Execute the task.

railroadtracks.unifex.build_AssetSet(AssetSet, values)[source]
Parameters:
  • AssetSet
  • values – values to create instances in the AssetSet
railroadtracks.unifex.unified_exec_run(args, steplist, msg=[])[source]

Run a command. :param args: arguments in a class such as the one returned by argparse.ArgumentParser.parse() :param steplist: sequence of known steps. The args will be matched against this to find the model class. :type steplist: sequence of core.StepAbstract-inherting instances :param msg: list with (eventual) messages