FASTA files¶

FASTA files have been around for quite a long time, and remain in use today. Their success might have to with their relative structure and the fact that they are text files (ASCII)

A header line (starting with an >)
An arbitrary number of lines for the sequence
Repeat the above if necessary

Countless FASTA parsers have been implemented, but given the simplicity with which one can write offer one here as well so we do not require a third-party package for this alone.

from ngs_plumbing import fasta
fn = 'mygenome.fa'
fa = fasta.FastaFile(fn)

for entry in fa:
    print(entry.header)

Now what we have here is a twist with a way to handle binary FASTA, with the associated benefits of smaller storage space needed, shorter loading times, and shorter access times to retrive a specific entry.

from ngs_plumbing import fasta
fn_a = 'mygenome.fa'
fn_b = 'mygenome.fab'
fasta.FastabFile.from_fastafile(fn_a, fn_b)

fb = fasta.FastabFile(fn_b)

Iterating through the file can be achieved with:

for entry in fa:
    print(entry.header)

Note

the sequence 2-bit encoded and the function ngs_plumbing.dna.bytes_frombit2bytes() should be used to obtain the DNA.

FASTA files¶

Previous topic

Next topic

This Page

Navigation

FASTA files¶

Previous topic

Next topic

This Page

Quick search

Navigation