Pyteomics documentation v3.4.2

fasta - manipulations with FASTA databases

«  electrochem - electrochemical properties of polypeptides   ::   Contents   ::   mzml - reader for mass spectrometry data in mzML format  »

fasta - manipulations with FASTA databases

FASTA is a simple file format for protein sequence databases. Please refer to the NCBI website for the most detailed information on the format.

Data manipulation

read() - iterate through entries in a FASTA database.

chain() - read multiple files at once.

chain.from_iterable() - read multiple files at once, using an iterable of files.

write() - write entries to a FASTA database.

parse() - parse a FASTA header.

Decoy sequence generation

decoy_sequence() - generate a decoy sequence from a given sequence, using one of the other functions listed in this section or any other callable.

reverse() - generate a reversed decoy sequence.

shuffle() - generate a shuffled decoy sequence.

fused_decoy() - generate a “fused” decoy sequence.

Decoy database generation

decoy_db() - generate entries for a decoy database from a given FASTA database.

decoy_chain() - a version of decoy_db() for multiple files.

decoy_chain.from_iterable() - like decoy_chain(), but with an iterable of files.

write_decoy_db() - generate a decoy database and print it to a file.

Auxiliary

std_parsers - a dictionary with parsers for known FASTA header formats.

pyteomics.fasta.chain(*args, **kwargs)

Chain read() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the read() function.

chain.from_iterable(files, **kwargs)

Chain read() for several files. Keyword arguments are passed to the read() function.

Parameters:

files : iterable

Iterable of file names or file objects.

pyteomics.fasta.decoy_chain(*args, **kwargs)

Chain decoy_db() for several files. Positional arguments should be file names or file objects. Keyword arguments are passed to the decoy_db() function.

decoy_chain.from_iterable(files, **kwargs)

Chain decoy_db() for several files. Keyword arguments are passed to the decoy_db() function.

Parameters:

files : iterable

Iterable of file names or file objects.

pyteomics.fasta.decoy_db(*args, **kwargs)[source]

Iterate over sequences for a decoy database out of a given source.

Parameters:

source : file-like object or str or None, optional

A path to a FASTA database or a file object itself. Default is None, which means read standard input.

mode : str or callable, optional

Algorithm of decoy sequence generation. ‘reverse’ by default. See decoy_sequence() for more information.

prefix : str, optional

A prefix to the protein descriptions of decoy entries. The default value is ‘DECOY_’.

decoy_only : bool, optional

If set to True, only the decoy entries will be written to output. If False, the entries from source will be written first. False by default.

ignore_comments : bool, optional

If True then ignore the second and subsequent lines of description. Default is False.

parser : function or None, optional

Defines whether the fasta descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format guessing. Default is None, which means return the header “as is”.

**kwargs : given to decoy_sequence().

Returns:

out : iterator

An iterator over entries of the new database.

pyteomics.fasta.decoy_sequence(sequence, mode=’reverse’, **kwargs)[source]

Create a decoy sequence out of a given sequence string.

Parameters:

sequence : str

The initial sequence string.

mode : str or callable, optional

Type of decoy sequence. Should be one of the standard modes or any callable. Standard modes are:

Default is ‘reverse’.

**kwargs : given to the decoy function.

Returns:

decoy_sequence : str

The decoy sequence.

pyteomics.fasta.fused_decoy(sequence, decoy_mode=’reverse’, sep=’R’, **kwargs)[source]

Create a “fused” decoy sequence by concatenating a decoy sequence with the original one. The method and its use cases are described in:

Ivanov, M. V., Levitsky, L. I., & Gorshkov, M. V. (2016). Adaptation of Decoy Fusion Strategy for Existing Multi-Stage Search Workflows. Journal of The American Society for Mass Spectrometry, 27(9), 1579-1582.

Parameters:

sequence : str

The initial sequence string.

decoy_mode : str or callable, optional

Type of decoy sequence to use. Should be one of the standard modes or any callable. Standard modes are:

Default is ‘reverse’.

sep : str, optional

Amino acid motif that separates the decoy sequence from the target one. This setting should reflect the enzyme specificity used in the search against the database being generated. Default is ‘R’, which is suitable for trypsin searches.

**kwargs : given to the decoy generation function.

Examples

>>> fused_decoy('PEPT')
'TPEPRPEPT'
>>> fused_decoy('MPEPT', 'shuffle', 'K', keep_nterm=True)
'MPPTEKMPEPT'
pyteomics.fasta.parse(header, flavour=’auto’, parsers=None)[source]

Parse the FASTA header and return a nice dictionary.

Parameters:

header : str

FASTA header to parse

flavour : str, optional

Short name of the header format (case-insensitive). Valid values are 'auto' and keys of the parsers dict. Default is 'auto', which means try all formats in turn and return the first result that can be obtained without an exception.

parsers : dict, optional

A dict where keys are format names (lowercased) and values are functions that take a header string and return the parsed header. Default is None, which means use the default dictionary std_parsers.

Returns:

out : dict

A dictionary with the info from the header. The format depends on the flavour.

pyteomics.fasta.read(*args, **kwargs)[source]

Read a FASTA file and return entries iteratively.

Parameters:

source : str or file or None, optional

A file object (or file name) with a FASTA database. Default is None, which means read standard input.

ignore_comments : bool, optional

If True then ignore the second and subsequent lines of description. Default is False.

parser : function or None, optional

Defines whether the fasta descriptions should be parsed. If it is a function, that function will be given the description string, and the returned value will be yielded together with the sequence. The std_parsers dict has parsers for several formats. Hint: specify parse() as the parser to apply automatic format recognition. Default is None, which means return the header “as is”.

Returns:

out : iterator of tuples

A named 2-tuple with FASTA header (str or dict) and sequence (str). Attributes ‘description’ and ‘sequence’ are also provided.

pyteomics.fasta.reverse(sequence, keep_nterm=False)[source]

Create a decoy sequence by reversing the original one.

Parameters:

sequence : str

The initial sequence string.

keep_nterm : bool, optional

If True, then the N-terminal residue will be kept. Default is False.

Returns:

decoy_sequence : str

The decoy sequence.

pyteomics.fasta.shuffle(sequence, keep_nterm=False)[source]

Create a decoy sequence by shuffling the original one.

Parameters:

sequence : str

The initial sequence string.

keep_nterm : bool, optional

If True, then the N-terminal residue will be kept. Default is False.

Returns:

decoy_sequence : str

The decoy sequence.

pyteomics.fasta.std_parsers

A dictionary with parsers for known FASTA header formats. For now, supported formats are those described at UniProt help page.

pyteomics.fasta.write(*args, **kwargs)[source]

Create a FASTA file with entries.

Parameters:

entries : iterable of (str, str) tuples

An iterable of 2-tuples in the form (description, sequence).

output : file-like or str, optional

A file open for writing or a path to write to. If the file exists, it will be opened for appending. Default is None, which means write to standard output.

file_mode : str, keyword only, optional

If output is a file name, defines the mode the file will be opened in. Otherwise will be ignored. Default is ‘a’.

Returns:

output_file : file object

The file where the FASTA is written.

pyteomics.fasta.write_decoy_db(*args, **kwargs)[source]

Generate a decoy database out of a given source and write to file.

If output is a path, the file will be open for appending, so no information will be lost if the file exists. Although, the user should be careful when providing open file streams as source and output. The reading and writing will start from the current position in the files, which is where the last I/O operation finished. One can use the file.seek() method to change it.

Parameters:

source : file-like object or str or None, optional

A path to a FASTA database or a file object itself. Default is None, which means read standard input.

output : file-like object or str, optional

A path to the output database or a file open for writing. Defaults to None, the results go to the standard output.

mode : str or callable, optional

Algorithm of decoy sequence generation. ‘reverse’ by default. See decoy_sequence() for more details.

prefix : str, optional

A prefix to the protein descriptions of decoy entries. The default value is ‘DECOY_’

decoy_only : bool, optional

If set to True, only the decoy entries will be written to output. If False, the entries from source will be written as well. False by default.

file_mode : str, keyword only, optional

If output is a file name, defines the mode the file will be opened in. Otherwise will be ignored. Default is ‘a’.

**kwargs : given to decoy_sequence().

Returns:

output : file

A (closed) file object for the created file.

«  electrochem - electrochemical properties of polypeptides   ::   Contents   ::   mzml - reader for mass spectrometry data in mzML format  »