FASTA files have been around for quite a long time, and remain in use today. Their success might have to with their relative structure and the fact that they are text files (ASCII)
Countless FASTA parsers have been implemented, but given the simplicity with which one can write offer one here as well so we do not require a third-party package for this alone.
from ngs_plumbing import fasta fn = 'mygenome.fa' fa = fasta.FastaFile(fn) for entry in fa: print(entry.header)
Now what we have here is a twist with a way to handle binary FASTA, with the associated benefits of smaller storage space needed, shorter loading times, and shorter access times to retrive a specific entry.
from ngs_plumbing import fasta fn_a = 'mygenome.fa' fn_b = 'mygenome.fab' fasta.FastabFile.from_fastafile(fn_a, fn_b) fb = fasta.FastabFile(fn_b)
Iterating through the file can be achieved with:
for entry in fa: print(entry.header)
the sequence 2-bit encoded and the function ngs_plumbing.dna.bytes_frombit2bytes() should be used to obtain the DNA.