Pyteomics documentation v3.4.1

mass - molecular masses and isotope distributions

«  parser - operations on modX peptide sequences   ::   Contents

mass - molecular masses and isotope distributions

Summary

This module defines general functions for mass and isotope abundance calculations. For most of the functions, the user can define a given substance in various formats, but all of them would be reduced to the Composition object describing its chemical composition.

Classes

Composition - a class storing chemical composition of a substance.

Unimod - a class representing a Python interface to the Unimod database (see pyteomics.mass.unimod for a much more powerful alternative).

Mass calculations

calculate_mass() - a general routine for mass / m/z calculation. Can calculate mass for a polypeptide sequence, chemical formula or elemental composition. Supplied with an ion type and charge, the function would calculate m/z.

fast_mass() - a less powerful but much faster function for polypeptide mass calculation.

fast_mass2() - a version of fast_mass that supports modX notation.

Isotopic abundances

isotopic_composition_abundance() - calculate the relative abundance of a given isotopic composition.

most_probable_isotopic_composition() - finds the most abundant isotopic composition for a molecule defined by a polypeptide sequence, chemical formula or elemental composition.

isotopologues() - iterate over possible isotopic conposition of a molecule, possibly filtered by abundance.

Data

nist_mass - a dict with exact masses of the most abundant isotopes.

std_aa_comp - a dict with the elemental compositions of the standard twenty amino acid residues, selenocysteine and pyrrolysine.

std_ion_comp - a dict with the relative elemental compositions of the standard peptide fragment ions.

std_aa_mass - a dict with the monoisotopic masses of the standard twenty amino acid residues, selenocysteine and pyrrolysine.


Composition.__init__(*args, **kwargs)[source]

A Composition object stores a chemical composition of a substance. Basically it is a dict object, in which keys are the names of chemical elements and values contain integer numbers of corresponding atoms in a substance.

The main improvement over dict is that Composition objects allow addition and subtraction.

A Composition object can be initialized with one of the following arguments: formula, sequence, parsed_sequence or split_sequence.

If none of these are specified, the constructor will look at the first positional argument and try to build the object from it. Without positional arguments, a Composition will be constructed directly from keyword arguments.

If there’s an ambiguity, i.e. the argument is both a valid sequence and a formula (such as ‘HCN’), it will be treated as a sequence. You need to provide the ‘formula’ keyword to override this.

Warning

Be careful when supplying a list with a parsed sequence or a split sequence as a keyword argument. It must be obtained with enabled show_unmodified_termini option. When supplying it as a positional argument, the option doesn’t matter, because the positional argument is always converted to a sequence prior to any processing.

Parameters:

formula : str, optional

A string with a chemical formula. All elements must be present in mass_data.

sequence : str, optional

A polypeptide sequence string in modX notation.

parsed_sequence : list of str, optional

A polypeptide sequence parsed into a list of amino acids.

split_sequence : list of tuples of str, optional

A polypeptyde sequence parsed into a list of tuples (as returned be pyteomics.parser.parse() with split=True).

aa_comp : dict, optional

A dict with the elemental composition of the amino acids (the default value is std_aa_comp).

mass_data : dict, optional

A dict with the masses of chemical elements (the default value is nist_mass). It is used for formulae parsing only.

Composition.mass(**kwargs)[source]

Calculate the mass or m/z of a Composition.

Parameters:

average : bool, optional

If True then the average mass is calculated. Note that mass is not averaged for elements with specified isotopes. Default is False.

charge : int, optional

If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by charge.

mass_data : dict, optional

A dict with the masses of the chemical elements (the default value is nist_mass).

ion_comp : dict, optional

A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

ion_type : str, optional

If specified, then the polypeptide is considered to be in the form of the corresponding ion. Do not forget to specify the charge state!

Returns:

mass : float

class pyteomics.mass.mass.Unimod(source='http://www.unimod.org/xml/unimod.xml')[source]

A class for Unimod database of modifications. The list of all modifications can be retrieved via mods attribute. Methods for convenient searching are by_title and by_name. For more elaborate filtering, iterate manually over the list.

Note

See pyteomics.mass.unimod for a new alternative class with more features.

Methods

by_name(name[, strict]) Search modifications by name.
by_title(title[, strict]) Search modifications by title.
__init__(source='http://www.unimod.org/xml/unimod.xml')[source]

Create a database and fill it from XML file retrieved from source.

Parameters:

source : str or file, optional

A file-like object or a URL to read from. Don’t forget the 'file://' prefix when pointing to local files.

by_name(name, strict=True)[source]

Search modifications by name. If a single modification is found, it is returned. Otherwise, a list will be returned.

Parameters:

name : str

The full name of the modification(s).

strict : bool, optional

If False, the search will return all modifications whose full name contains title, otherwise equality is required. True by default.

Returns:

out : dict or list

A single modification or a list of modifications.

by_title(title, strict=True)[source]

Search modifications by title. If a single modification is found, it is returned. Otherwise, a list will be returned.

Parameters:

title : str

The modification title.

strict : bool, optional

If False, the search will return all modifications whose title contains title, otherwise equality is required. True by default.

Returns:

out : dict or list

A single modification or a list of modifications.

mass_data

Get element mass data extracted from the database

mods

Get the list of Unimod modifications

pyteomics.mass.mass.calculate_mass(*args, **kwargs)[source]

Calculates the monoisotopic mass of a polypeptide defined by a sequence string, parsed sequence, chemical formula or Composition object.

One or none of the following keyword arguments is required: formula, sequence, parsed_sequence, split_sequence or composition. All arguments given are used to create a Composition object, unless an existing one is passed as a keyword argument.

Note that if a sequence string is supplied and terminal groups are not explicitly shown, then the mass is calculated for a polypeptide with standard terminal groups (NH2- and -OH).

Warning

Be careful when supplying a list with a parsed sequence. It must be obtained with enabled show_unmodified_termini option.

Parameters:

formula : str, optional

A string with a chemical formula.

sequence : str, optional

A polypeptide sequence string in modX notation.

parsed_sequence : list of str, optional

A polypeptide sequence parsed into a list of amino acids.

composition : Composition, optional

A Composition object with the elemental composition of a substance.

aa_comp : dict, optional

A dict with the elemental composition of the amino acids (the default value is std_aa_comp).

average : bool, optional

If True then the average mass is calculated. Note that mass is not averaged for elements with specified isotopes. Default is False.

charge : int, optional

If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by charge.

mass_data : dict, optional

A dict with the masses of the chemical elements (the default value is nist_mass).

ion_comp : dict, optional

A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

ion_type : str, optional

If specified, then the polypeptide is considered to be in the form of the corresponding ion. Do not forget to specify the charge state!

Returns:

mass : float

pyteomics.mass.mass.fast_mass(sequence, ion_type=None, charge=None, **kwargs)[source]

Calculate monoisotopic mass of an ion using the fast algorithm. May be used only if amino acid residues are presented in one-letter code.

Parameters:

sequence : str

A polypeptide sequence string.

ion_type : str, optional

If specified, then the polypeptide is considered to be in a form of corresponding ion. Do not forget to specify the charge state!

charge : int, optional

If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by z.

mass_data : dict, optional

A dict with the masses of chemical elements (the default value is nist_mass).

aa_mass : dict, optional

A dict with the monoisotopic mass of amino acid residues (default is std_aa_mass);

ion_comp : dict, optional

A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

Returns:

mass : float

Monoisotopic mass or m/z of a peptide molecule/ion.

pyteomics.mass.mass.fast_mass2(sequence, ion_type=None, charge=None, **kwargs)[source]

Calculate monoisotopic mass of an ion using the fast algorithm. modX notation is fully supported.

Parameters:

sequence : str

A polypeptide sequence string.

ion_type : str, optional

If specified, then the polypeptide is considered to be in a form of corresponding ion. Do not forget to specify the charge state!

charge : int, optional

If not 0 then m/z is calculated: the mass is increased by the corresponding number of proton masses and divided by z.

mass_data : dict, optional

A dict with the masses of chemical elements (the default value is nist_mass).

aa_mass : dict, optional

A dict with the monoisotopic mass of amino acid residues (default is std_aa_mass);

ion_comp : dict, optional

A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

Returns:

mass : float

Monoisotopic mass or m/z of a peptide molecule/ion.

pyteomics.mass.mass.isotopic_composition_abundance(*args, **kwargs)[source]

Calculate the relative abundance of a given isotopic composition of a molecule.

Parameters:

formula : str, optional

A string with a chemical formula.

composition : Composition, optional

A Composition object with the isotopic composition of a substance.

mass_data : dict, optional

A dict with the masses of chemical elements (the default value is nist_mass).

Returns:

relative_abundance : float

The relative abundance of a given isotopic composition.

pyteomics.mass.mass.isotopologues(*args, **kwargs)[source]

Iterate over possible isotopic states of a molecule. The molecule can be defined by formula, sequence, parsed sequence, or composition. The space of possible isotopic compositions is restrained by parameters elements_with_isotopes, isotope_threshold, overall_threshold.

Parameters:

formula : str, optional

A string with a chemical formula.

sequence : str, optional

A polypeptide sequence string in modX notation.

parsed_sequence : list of str, optional

A polypeptide sequence parsed into a list of amino acids.

composition : Composition, optional

A Composition object with the elemental composition of a substance.

report_abundance : bool, optional

If True, the output will contain 2-tuples: (composition, abundance). Otherwise, only compositions are yielded. Default is False.

elements_with_isotopes : container of str, optional

A set of elements to be considered in isotopic distribution (by default, every element has an isotopic distribution).

isotope_threshold : float, optional

The threshold abundance of a specific isotope to be considered. Default is 5e-4.

overall_threshold : float, optional

The threshold abundance of the calculateed isotopic composition. Default is 0.

aa_comp : dict, optional

A dict with the elemental composition of the amino acids (the default value is std_aa_comp).

mass_data : dict, optional

A dict with the masses of chemical elements (the default value is nist_mass).

Returns:

out : iterator

Iterator over possible isotopic compositions.

pyteomics.mass.mass.most_probable_isotopic_composition(*args, **kwargs)[source]

Calculate the most probable isotopic composition of a peptide molecule/ion defined by a sequence string, parsed sequence, chemical formula or Composition object.

Note that if a sequence string without terminal groups is supplied then the isotopic composition is calculated for a polypeptide with standard terminal groups (H- and -OH).

For each element, only two most abundant isotopes are considered.

Parameters:

formula : str, optional

A string with a chemical formula.

sequence : str, optional

A polypeptide sequence string in modX notation.

parsed_sequence : list of str, optional

A polypeptide sequence parsed into a list of amino acids.

composition : Composition, optional

A Composition object with the elemental composition of a substance.

elements_with_isotopes : list of str

A list of elements to be considered in isotopic distribution (by default, every element has a isotopic distribution).

aa_comp : dict, optional

A dict with the elemental composition of the amino acids (the default value is std_aa_comp).

mass_data : dict, optional

A dict with the masses of chemical elements (the default value is nist_mass).

ion_comp : dict, optional

A dict with the relative elemental compositions of peptide ion fragments (default is std_ion_comp).

Returns:

out: tuple (Composition, float) :

A tuple with the most probable isotopic composition and its relative abundance.

pyteomics.mass.mass.nist_mass

A dict with the exact element masses downloaded from the NIST website: http://www.nist.gov/pml/data/comp.cfm . There are entries for each element containing the masses and relative abundances of several abundant isotopes and a separate entry for undefined isotope with zero key, mass of the most abundant isotope and 1.0 abundance.

pyteomics.mass.mass.std_aa_comp

A dictionary with elemental compositions of the twenty standard amino acid residues, selenocysteine, pyrrolysine, and standard H- and -OH terminal groups.

pyteomics.mass.mass.std_aa_mass

A dictionary with monoisotopic masses of the twenty standard amino acid residues, selenocysteine and pyrrolysine.

pyteomics.mass.mass.std_ion_comp

A dict with relative elemental compositions of the standard peptide fragment ions. An elemental composition of a fragment ion is calculated as a difference between the total elemental composition of an ion and the sum of elemental compositions of its constituting amino acid residues.

«  parser - operations on modX peptide sequences   ::   Contents