Color space

SOLiD data represent consecutive dinucleotides, each overlapping by one nucleotide with the next, and each dinucleotide is measured as one of 4 possible different colors. The data are then referred to as in color space.

Many tools for NGS work in base space, and a conversion from colors to bases is then needed. A biostar page presents nicely the color space formats.

Naive translation

Given that the first base in known, the ambiguity of the colors can resolved by walking along the color sequence and the full sequence of bases be reconstructed. However, should a sequencing error be present the wrong color will cause the translation to bases downstream to be wrong. This is something that is advertised as helping to separate sequencing errors from genuine but rare mutations. When read quality is very good this can nevertheless be an easy option to move from color space to base space.

Double encoding

With double encoding, each one of the four colors is replaced by an arbitrary base. The base sequence obtained does not make biological sense but will work with programs expecting sequences in the form ATGC sequences.

The results will have to be recoded back to color space after processing by the

Note

The assembler velvet is requiring this double encoding to best process color space data. Note that this is not completely transparent, and velvet will have to be compiled for color space.

Module

class ngs_plumbing.colorspace.FastaqualWriter(fn, d)[source]
close()[source]
flush()[source]
write(entry, seq_proc=None)[source]
class ngs_plumbing.colorspace.FastqWriter(fn, d)[source]
close()[source]
flush()[source]
write(entry, seq_proc=None, qual_proc=None)[source]
ngs_plumbing.colorspace.ddecode_seq(seq)[source]
ngs_plumbing.colorspace.ddecode_seq_G(seq)[source]
ngs_plumbing.colorspace.ddecode_seq_T(seq)[source]
ngs_plumbing.colorspace.dencode_seq(seq)[source]
ngs_plumbing.colorspace.exec_code()[source]

Table Of Contents

Previous topic

XSQ

Next topic

Using SCons for processing pipelines

This Page