The extended OBITools fasta and fastq format¶

The extended OBITools Fasta format is a strict fasta format file. The file in extended OBITools Fasta format can be readed by all programs reading fasta files.

Difference between standard and extended fasta is just the structure of the title line. For OBITools title line is divided in three parts :

Seqid : the sequence identifier

key=value; : a set of key/value keys

the sequence definition

>my_sequence taxid=3456; direct=True; sample=A354; this is my pretty sequence
ACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGT
GTGCTGACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGTGTTT
AACGACGTTGCAGTACGTTGCAGT

Following these rules, the title line can be parsed :

The sequence identifier of this sequence is : my_sequence

Three keys are assigned to this sequence :

Key taxid with value 3456

Key direct with value True

Key sample with value A354

The definition of this sequence is this is my pretty sequence

Values can be any valid python expression. If a key value cannot be evaluated as a python expression, it is them assumed as a simple string. Following this rule, taxid value is considered as an integer value, direct value as a boolean and sample value is not a valid python expression so it is considered as a string value.

Names reserved for attributes¶

The following attribute names are created by some obitools programs and used by others. They have a special meaning. So we recommend not to use them with another semantic.

Contents:

The extended OBITools fasta and fastq format¶

Names reserved for attributes¶

Table Of Contents

Previous topic

Next topic

This Page