Using SCons for processing pipelinesΒΆ

Deprecated since version 0.12: This was very experimental (and possibly not working), and is deprecated anyway.

Note

This requires scons installed.

No plumbing happens without pipes at some point; this package also provides tools for handling pipelines.

SCons is a system to build software, akin to make and Makefile, but it also be used to run any sequence of computing steps depending on one another such as in an analysis pipeline. This is of interest for implementing some kind of fault tolerance and be able to resume a pipeline after interruption without re-runnning everything. Causes for an interrupted pipeline go from hardware failure, to running out of ressources (storage, or memory), to unavailable third-party ressources, and to a queuing system with defined timeout killing tasks.

An SConstruct file taking FASTAQ files to sorted and indexed files ready to use with pysam would look like:

from ngs_plumbing.scons import Bowtie_5500_ECC

# path to the reference index file
reference = '/path/to/index'

bowtie_5500_ecc = Bowtie_5500_ECC(mapper = 'bowtie',
                                  reference = reference,
                                  nproc = 3)
env = Environment(BUILDER = {'bowtie': bowtie_5500_ecc,
                             'st_view': samtools_view,
                             'st_sort': samtools_sort,
                             'st_index': samtools_index})
Decider('timestamp-match')
# loop over the file names (no extension)
my_samples = ('foo', 'bar')
for fn in my_files:
    env.bowtie(fn)
    env.st_view(fn)
    env.st_sort(fn)
    env.st_index(fn)

Running the pipeling can now go as simply as:

scons

It is easy to create one’s own functions.

Previous topic

Color space

Next topic

Change logs

This Page