Pyteomics documentation v2.5.0

Mass and isotopes

«  Peptide properties: mass, charge, chromatographic retention   ::   Contents   ::   Charge and pI  »

Mass and isotopes

The functions related to mass calculations and isotopic distributions are organized into the pyteomics.mass module.

Basic mass calculations

The most common task in mass spectrometry data analysis is to calculate the mass of an organic molecule or peptide or m/z ratio of an ion. The tasks of this kind can be performed with the pyteomics.mass.calculate_mass() function. It works with chemical formulas, polypeptide sequences in modX notation, pre-parsed sequences and dictionaries of chemical compositions:

>>> from pyteomics import mass
>>> mass.calculate_mass(formula='H2O')
18.0105646837036

>>> mass.calculate_mass(formula='C2H5OH')
46.0418648119876

>>> mass.calculate_mass(composition={'H':2, 'O':1})
18.0105646837036

>>> mass.calculate_mass(sequence='PEPTIDE')
799.359964027207

>>> from pyteomics import parser
>>> ps = parser.parse('PEPTIDE', show_unmodified_termini=True)
>>> mass.calculate_mass(parsed_sequence=ps)
799.359964027207

Warning

Always set show_unmodified_termini=True when parsing a sequence, if you want to use the result to calculate the mass. Otherwise, the mass of the terminal hydrogen and hydroxyl will not be taken into account.

Mass-to-charge ratio of ions

pyteomics.mass.calculate_mass() can be used to calculate the mass/charge ratio of peptide ions and ionized fragments. To do that, simply supply the type of the peptide ionized fragment and its charge:

>>> from pyteomics import mass
>>> mass.calculate_mass(sequence='PEPTIDE', ion_type='M', charge=2)
400.6872584803735

>>> mass.calculate_mass(sequence='PEP', ion_type='b', charge=1)
324.15539725264904

>>> mass.calculate_mass(sequence='TIDE', ion_type='y', charge=1)
477.219119708098

Mass of modified peptides

With pyteomics.mass.calculate_mass() you can calculate masses of modified peptides as well. For the function to recognize the modified residue, you need to add the information about its elemental composition to the pyteomics.mass.std_aa_comp dictionary used in the calculations by default.

>>> from pyteomics import mass
>>> mass.std_aa_comp['pT'] = mass.Composition(
...    {'C': 4, 'H': 8, 'N': 1, 'O': 5, 'P': 1})
>>> mass.calculate_mass(sequence='PEPpTIDE')
879.3262945499629

To add information about modified amino acids to a user-defined aa_comp dict one can either add the composition info for a specific modified residue or just for a modification:

>>> from pyteomics import mass
>>> aa_comp = dict(mass.std_aa_comp)
>>> aa_comp['p'] = mass.Composition('HPO3')
>>> mass.calculate_mass('pT', aa_comp=aa_comp)
199.02457367493957

In this example we call calculate_mass() with a positional (non-keyword) argument (‘pT’). This feature was added in version 1.2.4. When you provide a non-keyword argument, it will be treated as a sequence; if it fails, it will be treated as a formula; in case it fails as well, a PyteomicsError will be raised. Note that ‘pT’ is treated as a sequence here, so default terminal groups are implied when calculating the composition and mass:

>>> mass.calculate_mass('pT', aa_comp=aa_comp) == mass.calculate_mass(aa_comp['p']) + mass.calculate_mass(aa_comp['T']) + mass.calculate_mass('H2O')
True

You can create a specific entry for a modified amino acid to override the modification on a specific residue:

>>> aa_comp['pT'] = mass.Composition({'N': 2})
>>> mass.Composition('pT', aa_comp=aa_comp)
{'H': 2, 'O': 1, 'N': 2}
>>> mass.Composition('pS', aa_comp=aa_comp)
{'H': 8, 'C': 3, 'N': 1, 'O': 6, 'P': 1}

Unimod database is an excellent resource for the information on the chemical compositions of known protein modifications. Version 2.0.3 introduces pyteomics.mass.Unimod class that can serve as a Python interface to Unimod:

>>> db = mass.Unimod()
>>> aa_comp = dict(mass.std_aa_comp)
>>> aa_comp['p'] = db.by_title('Phospho')['composition']
>>> mass.calculate_mass('PEpTIDE', aa_comp=aa_comp)
782.2735307010443

Chemical compositions

Some problems in organic mass spectrometry deal with molecules made by addition or subtraction of standard chemical ‘building blocks’. In pyteomics.mass there are two ways to approach these problems.

  • There is a pyteomics.mass.Composition class intended to store chemical formulas. pyteomics.mass.Composition objects are dicts that can be added or subtracted from one another or multiplied by integers.

    >>> from pyteomics import mass
    >>> p = mass.Composition(formula='HO3P') # Phosphate group
    Composition({'H': 1, 'O': 3, 'P': 1})
    >>> mass.std_aa_comp['T']
    Composition{'C': 4, 'H': 7, 'N': 1, 'O': 2})
    >>> p + mass.std_aa_comp['T']
    Composition({'C': 4, 'H': 8, 'N': 1, 'O': 5, 'P': 1})
    

    The values of pyteomics.mass.std_aa_comp are pyteomics.mass.Composition objects.

  • All functions that accept a formula keyword argument sum and subtract numbers following the same atom in the formula:

    >>> from pyteomics import mass
    >>> mass.calculate_mass(formula='C2H6') # Ethane
    30.046950192426
    >>> mass.calculate_mass(formula='C2H6H-2') # Ethylene
    28.031300128284002
    

Faster mass calculations

While pyteomics.mass.calculate_mass() has a flexible and convenient interface, it may be too slow for large-scale calculations. There is an optimized and simplified version of this function named pyteomics.mass.fast_mass(). It works only with unmodified sequences in standard one-letter IUPAC notation. Like pyteomics.mass.calculate_mass(), pyteomics.mass.fast_mass() can calculate m/z when provided with ion type and charge.

>>> from pyteomicss import mass
>>> mass.fast_mass('PEPTIDE')
799.3599446837036

Isotopes

If not specified, pyteomics.mass assumes that the substances are in the pure isotopic state. However, you may specify particular isotopic state in brackets (e.g. O[18], N[15]) in a chemical formula. An element with unspecified isotopic state is assumed to have the mass of the most stable isotope and abundance of 100%.

>>> mass.calculate_mass(formula='H[2]2O') # Heavy water
20.0231181752416
>>> mass.calculate_mass(formula='H[2]HO') # Semiheavy water
19.0168414294726

pyteomics.mass.isotopic_composition_abundance() function calculates the relative abundance of a given isotopic state of a molecule. The input can be provided as a formula or as a Composition/dict.

>>> from pyteomics import mass
>>> mass.isotopic_composition_abundance(formula='H2O') # Water with an unspecified isotopic state
1.0
>>> mass.isotopic_composition_abundance(formula='H[2]2O') # Heavy water
1.3386489999999999e-08
>>> mass.isotopic_composition_abundance(formula='H[2]H[1]O') # Semiheavy water
0.0002313727050147582
>>> mass.isotopic_composition_abundance(composition={'H[2]’: 1, ‘H[1]’: 1, ‘O': 1}) # Semiheavy water
0.0002313727050147582
>>> mass.isotopic_composition_abundance(formula='H[2]2O[18]') # Heavy-hydrogen heavy-oxygen water
2.7461045585999998e-11

Warning

You cannot mix specified and unspecified states of the same element in one formula in pyteomics.mass.isotopic_composition_abundance() due to ambiguity.

>>> mass.isotopic_composition_abundance(formula='H[2]HO')
...
PyteomicsError: Pyteomics error, message: 'Please specify the isotopic states of all atoms of H or do not specify them at all.'

Finally, you can find the most probable isotopic composition for a substance with pyteomics.mass.most_probable_isotopic_composition() function. The substance is specified as a formula, a pyteomics.mass.Composition object or a modX sequence string.

>>> from pyteomics import mass
>>> mass.most_probable_isotopic_composition(formula='H2SO4')
Composition({'H[1]': 2.0,  'H[2]': 0.0,  'O[16]': 4.0,  'O[17]': 0.0,  'S[32]': 1.0,  'S[33]': 0.0})
>>> mass.most_probable_isotopic_composition(formula='C300H602')
Composition({'C[12]': 297.0, 'C[13]': 3.0, 'H[1]': 602.0, 'H[2]': 0.0})
>>> mass.most_probable_isotopic_composition(sequence='PEPTIDE'*100)
Composition({'C[12]': 3364.0,  'C[13]': 36.0,  'H[1]': 5102.0,  'H[2]': 0.0, 'N[14]': 698.0,  'N[15]': 2.0,  'O[16]':  398.0,  'O[17]': 3.0})

The information about chemical elements, their isotopes and relative abundances is stored in the pyteomics.mass.nist_mass dictionary.

>>> from pyteomics import mass
>>> print mass.nist_mass['C']
{0: (12.0, 1.0), 12: (12.0, 0.98938), 13: (13.0033548378, 0.01078), 14: (14.0032419894, 0.0)}

The zero key stands for the unspecified isotopic state. The data about isotopes are stored as tuples (accurate mass, relative abundance).

«  Peptide properties: mass, charge, chromatographic retention   ::   Contents   ::   Charge and pI  »