Pyteomics documentation v2.1.5

fasta - manipulations with FASTA databases

«  electrochem - electrochemical properties of polypeptides   ::   Contents   ::   mzml - reader for mass spectrometry data in mzML format  »

fasta - manipulations with FASTA databases

FASTA is a simple file format for protein sequence databases. Please refer to the NCBI website for the most detailed information on the format.

Data manipulation

read() - iterate through entries in a FASTA database

write() - write entries to a FASTA database

parse() - parse a FASTA header

Decoy database generation

decoy_sequence() - generate a decoy sequence from a given sequence

decoy_db() - generate entries for a decoy database from a given FASTA database

write_decoy_db() - generate a decoy database and print it to a file

Auxiliary

std_parsers - a dictionary with parsers for known FASTA header formats.


pyteomics.fasta.decoy_db(*args, **kwargs)[source]

Iterate over sequences for a decoy database out of a given source.

If output is a path, the file will be open for appending, so no information will be lost if the file exists. Although, the user should be careful when providing open file streams as source and output. The reading and writing will start from the current position in the files, which is where the last I/O operation finished. One can use the file.seek() method to change it.

Parameters :

source : file-like object or str or None, optional

A path to a FASTA database or a file object itself. Default is None, which means read standard input.

mode : {‘reverse’, ‘shuffle’}, optional

Algorithm of decoy sequence generation. ‘reverse’ by default.

prefix : str, optional

A prefix to the protein descriptions of decoy entries. The default value is ‘DECOY_‘.

decoy_only : bool, optional

If set to True, only the decoy entries will be written to output. If False, the entries from source will be written first. False by default.

Returns :

out : iterator

An iterator over entries of the new database.

pyteomics.fasta.decoy_sequence(sequence, mode)[source]

Create a decoy sequence out of a given sequence string.

Parameters :

sequence : str

The initial sequence string.

mode : {‘reverse’, ‘shuffle’}

Type of decoy sequence.

Returns :

modified_sequence : str

The modified sequence.

pyteomics.fasta.parse(header, flavour='auto', parsers=None)[source]

Parse the FASTA header and return a nice dictionary.

Parameters :

header : str

FASTA header to parse

flavour : str, optional

Short name of the header format (case-insensitive). Valid values are 'auto' and keys of the parsers dict. Default is 'auto', which means try all formats in turn and return the first result that can be obtained without an exception.

parsers : dict, optional

A dict where keys are format names (lowercased) and values are functions that take a header string and return the parsed header. Default is None, which means use the default dictionary std_parsers.

Returns :

out : dict

A dictionary with the info from the header. The format depends on the flavour.

pyteomics.fasta.read(*args, **kwargs)[source]

Read a FASTA file and return entries iteratively.

Parameters :

source : str or file or None, optional

A file object (or file name) with a FASTA database. Default is None, which means read standard input.

ignore_comments : bool, optional

If True then ignore the second and subsequent lines of description. Default is False.

parser : function or None, optional

Defines whether the fasta descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format guessing. Default is None, which means return the header “as is”.

Returns :

out : iterator of tuples

A named 2-tuple with FASTA header (str) and sequence (str). Attributes ‘description’ and ‘sequence’ are also provided.

pyteomics.fasta.write(entries, output=None)[source]

Create a FASTA file with entries.

Parameters :

entries : iterable of (str, str) tuples

An iterable of 2-tuples in the form (description, sequence).

output : file-like or str, optional

A file open for writing or a path to write to. If the file exists, it will be opened for appending. Default is None, which means write to standard output.

Returns :

output_file : file object

The file where the FASTA is written.

pyteomics.fasta.write_decoy_db(source=None, output=None, mode='reverse', prefix='DECOY_', decoy_only=False)[source]

Generate a decoy database out of a given source and write to file.

If output is a path, the file will be open for appending, so no information will be lost if the file exists. Although, the user should be careful when providing open file streams as source and output. The reading and writing will start from the current position in the files, which is where the last I/O operation finished. One can use the file.seek() method to change it.

Parameters :

source : file-like object or str or None, optional

A path to a FASTA database or a file object itself. Default is None, which means read standard input.

output : file-like object or str, optional

A path to the output database or a file open for writing. Defaults to None, the results go to the standard output.

mode : {‘reverse’, ‘shuffle’}, optional

Algorithm of decoy sequence generation. ‘reverse’ by default.

prefix : str, optional

A prefix to the protein descriptions of decoy entries. The default value is “DECOY_

decoy_only : bool, optional

If set to True, only the decoy entries will be written to output. If False, the entries from source will be written as well. False by default.

Returns :

output : file

A file object for the created file.

«  electrochem - electrochemical properties of polypeptides   ::   Contents   ::   mzml - reader for mass spectrometry data in mzML format  »