FASTQ files¶

FASTQ files are like FASTA files, but with added information about the quality of each base in the sequence (sequence and quality are separated by a line with only a +).

from ngs_plumbing import fastq
fn = 'myreads.fq'
fq = fastq.FastqFile(fn)

entry = next(fq)

Note

A text-based format is not very efficient for larger files. Rather than propose a new standard (see below), we give tools to create something that will work for your own use.

http://imgs.xkcd.com/comics/standards.png

In the case of FASTQ files, they generally contain a large number of rows, each with potentially a difference length and we have to look for an end-of-line character for each one of them. In other words, during the parsing every single character in the file will be compared compared to LF, CR or a combo of both. A binary file would make things significantly faster (benchmark below).

ASCII
for a given entry:
- size (number of characters) of the header for a given entry
- size of the DNA sequence
- size of the quality string

import struct

s_sizes = 'III'
s_head, s_sequence, s_quality = struct.unpack(s_sizes, fh.read(4*3))
raw = fh.read(s_head + s_sequence + s_quality)
head, sequence, quality = struct.unpack(s_sizes, '')

Paired FASTQ files¶

When sequencing short fragments, a number of technologies allow to sequence both ends of a fragment or the same ends (in the 3’/5’ sense) of a double stranded fragment. This is called paired-end sequencing (the most common), mate-pair sequencing.

When sequences for both ends of the fragments are in 2 different files.

from ngs_plumbing import fastq
fn_1 = 'myreads_s_1_1.fq'
fn_2 = 'myreads_s_1_2.fq'
fq_pair = fastq.FastqFilePair(fn_1, fn_2)

entry_1, entry_2 = next(fq_pair)

FASTQ files¶

Paired FASTQ files¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

FASTQ files¶

Paired FASTQ files¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation