Cheat Sheet

The ruffus module is a lightweight way to add support for running computational pipelines.

Usage

Each stage or task in a computational pipeline is represented by a python function Each python function can be called in parallel to run multiple jobs.

1. Annotate functions with python decorators

Decorator Examples

@follows

  • Indicate task dependency
  • mkdir prerequisite shorthand

@follows ( task1, 'task2' ))

@follows ( task1, mkdir( 'my/directory/for/results' ))

@parallel

  • Parameters for parallel jobs

@parallel ( parameter_list )

@parallel ( parameter_generating_function )

@files

  • I/O parameters
  • skips up-to-date jobs

@files( parameter_list )

@files( parameter_generating_function )

Simplified syntax for tasks with a single job:

@files ( input_file, output_file, other_params, ... )

@files_re

  • I/O file names via regular expressions
  • start from lists of file names or glob results
  • skips up-to-date jobs

@files_re ( glob_str, matching_regex, output_pattern, ... )

@files_re ( file_names, matching_regex, input_pattern, output_pattern, ... )

@files_re ( glob_str, matching_regex, output_pattern, ... )

@files_re ( file_names, matching_regex, input_pattern, output_pattern, ... )

input_pattern/output_pattern are regex patterns used to create input/output file names from the starting list of either glob_str or file names

@check_if_uptodate

  • Checks if task needs to be run
@check_if_uptodate ( is_task_up_to_date_function )

@posttask

  • Calls function after task completes
  • touch_file shorthand

@posttask ( signal_task_completion_function )

@posttask (@touch_file( 'task1.completed' ))

3. Run the pipeline

pipeline_run(list_of_target_tasks, [list_of_tasks_forced_to_rerun, multiprocess = N_PARALLEL_JOBS])

See the Full Tutorial for a more complete introduction on how to add support for ruffus.

Table Of Contents

Previous topic

Simple 5 minute Tutorial

Next topic

Design & Architecture

This Page