The extended OBITools fasta and fastq format

The extended OBITools Fasta format is a strict fasta format file. The file in extended OBITools Fasta format can be readed by all programs reading fasta files.

Difference between standard and extended fasta is just the structure of the title line. For OBITools title line is divided in three parts :

  • Seqid : the sequence identifier
  • key=value; : a set of key/value keys
  • the sequence definition
>my_sequence taxid=3456; direct=True; sample=A354; this is my pretty sequence
ACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGT
GTGCTGACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGTACGTTGCAGTGTTT
AACGACGTTGCAGTACGTTGCAGT

Following these rules, the title line can be parsed :

  • The sequence identifier of this sequence is : my_sequence

  • Three keys are assigned to this sequence :
    • Key taxid with value 3456
    • Key direct with value True
    • Key sample with value A354
  • The definition of this sequence is this is my pretty sequence

Values can be any valid python expression. If a key value cannot be evaluated as a python expression, it is them assumed as a simple string. Following this rule, taxid value is considered as an integer value, direct value as a boolean and sample value is not a valid python expression so it is considered as a string value.