Model

Overview

The steps in a RNA-Seq sequence of operations are captured in a model. The purpose is to formalize the steps enough to simplify the use of various tools for each step, yet permit enough flexibility to allow an easy integration of additional tools and approaches.

A canonical graph of steps for RNA-Seq is shown below.

Note

There is no enforcement that a sequence of steps adheres strictly to this. The model is also defining the notion of activities, and it is possible that one step performs several activities (this will be the case for an in-house monolithic pipeline, for example). This feature allows us to integrate such steps into the comparison.

The model is directly used to provide an unified execution scheme (see Unified execution), and constitutes the basis for writing “recipes” (see Recipes).

digraph RNASeq {
"Genomes"->"References";
"Genomes"[label="repository of genomes", shape=invhouse, colorscheme=set36, style=filled, fillcolor=1];
"References" -> "IndexedReference" -> "Alignment";
"References"[shape=invhouse, colorscheme=set36, style=filled, fillcolor=1];
"IndexedReference"[label="Indexed References", colorscheme=set36, style=filled, fillcolor=2];
"SequencingReads" -> "Alignment";
"SequencingReads"[label="Sequencing Reads", shape=invhouse, colorscheme=set36, style=filled, fillcolor=5];
"Alignment" -> "Counts";
"TranscriptAnnotation" -> "Alignment";
"Alignment"[colorscheme=set36, style=filled, fillcolor=2];
"Genomes"->"TranscriptAnnotation";
"TranscriptAnnotation" -> "Counts";
"TranscriptAnnotation"[label="Transcript Annotations", shape=invhouse, colorscheme=set36, style=filled, fillcolor=1];
"Counts" -> "NormalizedCounts";
"Counts"[colorscheme=set36, style=filled, fillcolor=3];
"NormalizedCounts" -> "AbundanceEstimates";
"NormalizedCounts"[label="Normalized Counts", colorscheme=set36, style=filled, fillcolor=3];
"SampleInformation" -> "DifferentialExpression";
"SampleInformation"[label="Sample Information", shape=invhouse, colorscheme=set36, style=filled, fillcolor=5];
"Sample"->"SampleInformation";
"Sample"->"SequencingReads";
"Sample"[shape=invhouse, colorscheme=set36, style=filled, fillcolor=5];
"AbundanceEstimates"->"DifferentialExpression";
"TranscriptAnnotation" -> "AbundanceEstimates";
"AbundanceEstimates"[label="Abundance Estimates", colorscheme=set36, style=filled, fillcolor=4];
"DifferentialExpression"[label="Differential Expression", shape=invhouse, colorscheme=set36, style=filled, fillcolor=6];
}

Activities

Steps can perform one or several activities. The list of possible activities is in ACTIVITY

Inheritance diagram

Inheritance diagram of railroadtracks.rnaseq

Docstrings

Simple model for the expression analysis using RNA-Seq

class railroadtracks.rnaseq.ACTIVITY[source]

Activities that can be performed by the different steps modeled. (note: steps can combine several activities - the most obvious example is existing monolithic pipelines)

class railroadtracks.rnaseq.AssetsCRCHeadTail(source, target=None)[source]

Assets for CRCHeadTail

class railroadtracks.rnaseq.CRCHeadTail(executable=None)[source]

Compute a CRC32-based checksum on the beginning (head) and end (tail) of a file as a cheap way to check whether 2 files contain identical data. Might be useful with large files, however its main purpose is testing.

Assets

alias of AssetsCRCHeadTail