fasta - manipulations with FASTA databases¶
FASTA is a simple file format for protein sequence databases. Please refer to the NCBI website for the most detailed information on the format.
Data manipulation¶
Decoy database generation¶
decoy_sequence() - generate a decoy sequence from a given sequence
decoy_db() - generate entries for a decoy database from a given FASTA database
write_decoy_db() - generate a decoy database and print it to a file
Auxiliary¶
std_parsers - a dictionary with parsers for known FASTA header formats.
- pyteomics.fasta.decoy_db(*args, **kwargs)[source]¶
Iterate over sequences for a decoy database out of a given source.
If output is a path, the file will be open for appending, so no information will be lost if the file exists. Although, the user should be careful when providing open file streams as source and output. The reading and writing will start from the current position in the files, which is where the last I/O operation finished. One can use the file.seek() method to change it.
Parameters : source : file-like object or str or None, optional
A path to a FASTA database or a file object itself. Default is None, which means read standard input.
mode : {‘reverse’, ‘shuffle’}, optional
Algorithm of decoy sequence generation. ‘reverse’ by default.
prefix : str, optional
A prefix to the protein descriptions of decoy entries. The default value is ‘DECOY_‘.
decoy_only : bool, optional
If set to True, only the decoy entries will be written to output. If False, the entries from source will be written first. False by default.
Returns : out : iterator
An iterator over entries of the new database.
- pyteomics.fasta.decoy_sequence(sequence, mode)[source]¶
Create a decoy sequence out of a given sequence string.
Parameters : sequence : str
The initial sequence string.
mode : {‘reverse’, ‘shuffle’}
Type of decoy sequence.
Returns : modified_sequence : str
The modified sequence.
- pyteomics.fasta.parse(header, flavour='auto', parsers=None)[source]¶
Parse the FASTA header and return a nice dictionary.
Parameters : header : str
FASTA header to parse
flavour : str, optional
Short name of the header format (case-insensitive). Valid values are 'auto' and keys of the parsers dict. Default is 'auto', which means try all formats in turn and return the first result that can be obtained without an exception.
parsers : dict, optional
A dict where keys are format names (lowercased) and values are functions that take a header string and return the parsed header. Default is None, which means use the default dictionary std_parsers.
Returns : out : dict
A dictionary with the info from the header. The format depends on the flavour.
- pyteomics.fasta.read(*args, **kwargs)[source]¶
Read a FASTA file and return entries iteratively.
Parameters : source : str or file or None, optional
A file object (or file name) with a FASTA database. Default is None, which means read standard input.
ignore_comments : bool, optional
If True then ignore the second and subsequent lines of description. Default is False.
parser : function or None, optional
Defines whether the fasta descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format guessing. Default is None, which means return the header “as is”.
Returns : out : iterator of tuples
A named 2-tuple with FASTA header (str) and sequence (str). Attributes ‘description’ and ‘sequence’ are also provided.
- pyteomics.fasta.write(entries, output=None)[source]¶
Create a FASTA file with entries.
Parameters : entries : iterable of (str, str) tuples
An iterable of 2-tuples in the form (description, sequence).
output : file-like or str, optional
A file open for writing or a path to write to. If the file exists, it will be opened for appending. Default is None, which means write to standard output.
Returns : output_file : file object
The file where the FASTA is written.
- pyteomics.fasta.write_decoy_db(source=None, output=None, mode='reverse', prefix='DECOY_', decoy_only=False)[source]¶
Generate a decoy database out of a given source and write to file.
If output is a path, the file will be open for appending, so no information will be lost if the file exists. Although, the user should be careful when providing open file streams as source and output. The reading and writing will start from the current position in the files, which is where the last I/O operation finished. One can use the file.seek() method to change it.
Parameters : source : file-like object or str or None, optional
A path to a FASTA database or a file object itself. Default is None, which means read standard input.
output : file-like object or str, optional
A path to the output database or a file open for writing. Defaults to None, the results go to the standard output.
mode : {‘reverse’, ‘shuffle’}, optional
Algorithm of decoy sequence generation. ‘reverse’ by default.
prefix : str, optional
A prefix to the protein descriptions of decoy entries. The default value is “DECOY_“
decoy_only : bool, optional
If set to True, only the decoy entries will be written to output. If False, the entries from source will be written as well. False by default.
Returns : output : file
A file object for the created file.