pyteomics.biolccc 1.5.0 documentation

Tutorial

«  Installation   ::   Contents   ::   Advanced pyteomics.biolccc usage  »

Tutorial

Before we begin

The following help is written both for pyteomics.biolccc and underlying libBioLCCC library, written in C++. The only difference between two these packages lies in the syntax of commands. That is why we supply code snippets both for C++ and Python. Here is an example:

Example of a code snippet
C++ Python
#include <iostream>

int main() {
    int a = 1;
    int b = 2;

    std::cout << a+b << std::endl;

    return 0;
}
a = 1
b = 2

print a+b

The Python examples are specific to Python 2.x, since our project doesn’t support Python 3.x.

Basic concepts

There are a few simple concepts which are widely used in pyteomics.biolccc/libBioLCCC. Most of them are represented by corresponding classes. Here they are:

Polymer model - a set of assumptions used to describe a polymer molecule. This version of BioLCCC includes two polymer models:

ROD - this model represents a peptide as an absolutely rigid rod. Amino acids are modelled as regularly spaced beads threaded on this rod. This model describes peptides better comparing to long protein molecules.

The equations for the ROD model are to be published in the upcoming paper.

CHAIN - in this model a protein molecule is described as a free-joint chain of rods. The conformations of this molecule in a pore can be modelled as a random walk in the field of adsorbing walls. This assumption should work better for long protein molecules.

The CHAIN model was described in ‘’Liquid Chromatography at Critical Conditions: Comprehensive Approach to Sequence-Dependent Retention Time Prediction’‘, Alexander V. Gorshkov et al, Analytical Chemistry, 2006, 78 (22), 7770-7777. Link.

Chemical group - in pyteomics.biolccc/libBioLCCC that is an amino acid residue OR a peptide terminal group in a peptide chain. Examples are a histidine residue, phosphoserine residue and N-Terminal hydrogen that closes a peptide chain. The properties of a chemical group are stored in the ChemicalGroup class.

Chemical basis - a set of all physicochemical constants involved in the BioLCCC equations. This set contains:

  • the list of all chemical groups, i.e. amino acids and terminal groups. Any peptide can be represented as a series of these, that is why it is a basis similar to the mathematical basis;
  • which terminal groups are set by default (cannon be changed);
  • the chemical properties of solvents: densities, molar mass and adsorption energies (adsorption energy of the first solvent always equals zero);
  • the model of a polymer molecule being used in calculations and approximations used in the equations;
  • peptide geometry: the length of amino acid and the Kuhn length;
  • the range of an interaction between an amino acid and the surface of the solid phase (a.k.a. the width of the adsorbing layer).

The properties of a chemical basis are stored in the ChemicalBasis class.

A chemical basis is specific to the type of retention chemistry, solvents and ion paring agent being used in the experiment. In addition, it must be used only with the same polymer model as the one used in the calibration of the chemical basis.

Predefined chemical basis - a chemical basis, calculated (or, more precisely, calibrated) for the specific retention chemistry and model of a polymer molecule. The current version of pyteomics.biolccc/libBioLCCC contains two predefined chemical bases:

rpAcnFaRod - a ChemicalBasis calibrated for the reversed phase, ACN as a second solvent, 0.1% FA in both solvents and the ROD polymer model. The data were obtained in the joint research of Harvard University and Institute for Energy Problems for Chemical Physics, Russian Academy of Science.

rpAcnTfaChain - a chemical basis calibrated for the reversed phase, ACN as a second solvent, 0.1% TFA in both solvents and the CHAIN model. The initial data were taken from Guo et al, Journal of Chromatography, 359 (1986) 449-517.

Chromatographic conditions - a description of a chromatographic equipment and its settings. Contains:

  • the geometry of the column;
  • the properties of the adsorbent: average size of the pores, porosity (i.e. percentage of volume not filled with the solid phase), (volume of pores)/(total volume of column) ratio, relative adsorption strength;
  • elution parameters: the shape of the gradient, the composition of components, flow rate, delay time;
  • the step of integration over volume;
  • temperature of a column (EXPERIMENTAL).

The default values were set rather arbitrarily.

Peptide sequence notation

In pyteomics.biolccc/libBioLCCC we use the extended peptide notation. It is based on the one-letter IUPAC notation, but borrows only letters for the standard 20 aminoacids (i.e. no B, Z, X). We extended it in the following way:

  • Modified amino acids are denoted as xyzX, i.e. their labels start with an arbitrary number of lower-case letters and terminate with a single upper-case letter. The upper-case letter shows the base amino acid, while the lower-case letters describe the type of modification. The examples are:

    • oxM for oxidated methionine
    • pS for phosphorylated serine
    • pT for phosphorylated threonine
    • camC for carboxyamidomethylated cysteine
  • The non-standard peptide terminal groups are denoted as XxXx- and -XxXx for N-terminal and C-terminal groups correspondingly. The label could contain an arbitrary number of mixed lower-case and upper-case letters and numbers, but it should not be a valid peptide sequence. If a terminal group is not specified, it is assumed to be the standard one (i.e. an N-terminal hydrogen atom or C-terminal acidic group). The examples:

    • Ac- for N-Terminal acetylation
    • H- for N-Terminal hydrogen
    • -NH2 for C-Terminal amidation
    • -OH for C-Terminal carboxyl group
  • If a sequence contains two dots, then only the substring between them is parsed. This notation is used in several MS/MS search engines to show the adjacent amino acid residues for a peptide cleaved out of a protein. The examples are:

    • K.APGFGDNR.K
    • K.VGEVIVTK.D

Calculating retention time

calculateRT is the first pyteomics.biolccc function you may need. It requires three arguments: a peptide sequence, a chemical basis, and a description of chromatographic conditions. Supplied with these data, it calculates the retention time of the peptide.

Calculating the retention time of a peptide
C++ Python
#include <iostream>
#include <string>
#include <biolccc.h>

int main() 
{
    std::string peptide("Ac-PEPTIDE-NH2");
    double RT = BioLCCC::calculateRT(peptide,
        BioLCCC::rpAcnFaRod,
        BioLCCC::standardChromoConditions);
    std::cout << "The retention time of " 
            << peptide << " is " << RT << std::endl;
    return 0;
}
from pyteomics import biolccc

peptide = 'Ac-PEPTIDE-NH2'
RT = biolccc.calculateRT(peptide,
    biolccc.rpAcnFaRod,
    biolccc.standardChromoConditions)
print 'The retention time of', peptide, 'is', RT

Please, consult with the libBioLCCC C++ API documentation for the details of calculateRT function.

Specifying chromatographic conditions

The next thing you may need to learn is how to specify the chromatographic conditions. In order to do that, create a new instance of ChromoConditions and replace the default parameters with your own.

Specifying chromatographic conditions
C++ Python
#include <iostream>
#include <string>
#include <biolccc.h>

int main() 
{
    BioLCCC::ChromoConditions myChromoConditions;

    // The column length in mm.
    myChromoConditions.setColumnLength(100.0);

    // The internal column diameter in mm.
    myChromoConditions.setColumnDiameter(0.1);

    // The average pore size in A.
    myChromoConditions.setColumnPoreSize(300.0);

    // The concentration of the eluting solvent (ACN for the reversed
    // phase) in component A in %.
    myChromoConditions.setSecondSolventConcentrationA(5.0);

    // The concentration of the eluting solvent (ACN for the reversed
    // phase) in component B in %.
    myChromoConditions.setSecondSolventConcentrationB(80.0);

    // The shape of the gradient. The example is a linear gradient
    // from 0% to 90% of component B over 60 minutes.
    myChromoConditions.setGradient(
        BioLCCC::Gradient(0.0, 90.0, 60.0));

    // The flow rate in ml/min. 
    myChromoConditions.setFlowRate(0.0005);

    std::string peptide("Ac-PEPTIDE-NH2");
    double RT = BioLCCC::calculateRT(peptide,
        BioLCCC::rpAcnFaRod,
        myChromoConditions);
    std::cout << "The retention time of " 
        << peptide << " is " << RT << std::endl;
    return 0;
}
from pyteomics import biolccc

myChromoConditions = biolccc.ChromoConditions()

# The column length in mm.
myChromoConditions.setColumnLength(100.0)

# The internal column diameter in mm.
myChromoConditions.setColumnDiameter(0.1)

# The average pore size in A.
myChromoConditions.setColumnPoreSize(300.0)

# The concentration of the eluting solvent (ACN for the reversed
# phase) in component A in %.
myChromoConditions.setSecondSolventConcentrationA(5.0)

# The concentration of the eluting solvent (ACN for the reversed
# phase) in component B in %.
myChromoConditions.setSecondSolventConcentrationB(80.0)

# The shape of the gradient. The example is a linear gradient
# from 0% to 90% of component B over 60 minutes.
myChromoConditions.setGradient(biolccc.Gradient(0.0, 90.0, 60.0))

# The flow rate in ml/min. 
myChromoConditions.setFlowRate(0.0005)

peptide = 'Ac-PEPTIDE-NH2'
RT = biolccc.calculateRT(peptide,
    biolccc.rpAcnFaRod,
    myChromoConditions)
print 'The retention time of', peptide, 'is', RT

pyteomics.biolccc adds another way to interact with ChromoConditions. You can use its instances as Python dictionaries:

Dict-like syntax of ChromoConditions
Python
import pyBioLCCC

myChromoConditions = pyBioLCCC.ChromoConditions()
print myChromoConditions.keys()

myChromoConditions['columnLength'] = 100.0
myChromoConditions['columnDiameter'] = 0.1
myChromoConditions['columnPoreSize'] = 300.0
myChromoConditions['secondSolventConcentrationA'] = 5.0
myChromoConditions['secondSolventConcentrationB'] = 80.0
myChromoConditions['gradient'] = pyBioLCCC.Gradient(0.0, 90.0, 60.0)
myChromoConditions['flowRate'] = 0.0005

peptide = 'Ac-PEPTIDE-NH2'
RT = pyBioLCCC.calculateRT(peptide,
    pyBioLCCC.rpAcnFaRod,
    myChromoConditions)
print 'The retention time of', peptide, 'is', RT

Besides being more convenient and compact, this syntax allows ChromoConditions to be pickled.

If you want to see the full list of parameters stored in a ChromoConditions instance, please, take a look at the class description in the libBioLCCC C++ API documentation.

Calculating mass

pyteomics.biolccc contains functions to calculate the monoisotopic and average masses of a peptide. Besides the sequence of a peptide, you need to specify a ChemicalBasis instance which contains the masses of amino acids.

Calculating mass of a peptide
C++ Python
#include <iostream>
#include <string>
#include <biolccc.h>

int main()
{
    std::string peptide("Ac-PEPTIDE-NH2");

    double averageMass = BioLCCC::calculateAverageMass(
        peptide, BioLCCC::rpAcnFaRod);
    double monoisotopicMass = BioLCCC::calculateMonoisotopicMass(
        peptide, BioLCCC::rpAcnFaRod);

    std::cout << "Average mass of " << peptide << " is " 
        << averageMass << " Da" << std::endl;
    std::cout << "Monoisotopic mass of " << peptide << " is " 
        << monoisotopicMass << " Da" << std::endl;

    return 0;
}
from pyteomics import biolccc

peptide = 'Ac-PEPTIDE-NH2'

averageMass = biolccc.calculateAverageMass(
    peptide, biolccc.rpAcnFaRod)
monoisotopicMass = biolccc.calculateMonoisotopicMass(
    peptide, biolccc.rpAcnFaRod)

print 'The average mass of', peptide, 'is', averageMass, 'Da'
print 'The monoisotopic mass of', peptide, 'is', monoisotopicMass, 'Da'

Getting the list of predefined chemical groups

Before you begin to work with pyteomics.biolccc, it is useful to know which amino acids and terminal groups are predefined in this version of the library. To get this information just iterate through the chemicalGroups() map of the predefined chemical bases.

Examining a predefined chemical basis
C++ Python
#include <iostream>
#include <biolccc.h>

int main()
{
    for (std::map<std::string, BioLCCC::ChemicalGroup>::const_iterator
           chemicalGroupIt = BioLCCC::rpAcnFaRod.chemicalGroups().begin();
       chemicalGroupIt != BioLCCC::rpAcnFaRod.chemicalGroups().end();
       ++chemicalGroupIt)
    {
        std::cout << "Name: " << chemicalGroupIt->second.name() 
            << std::endl;
        std::cout << "Label: " << chemicalGroupIt->second.label()
            << std::endl;
        std::cout << "Bind energy: " << 
            chemicalGroupIt->second.bindEnergy() << std::endl;
        std::cout << "Average mass: "
            << chemicalGroupIt->second.averageMass() << std::endl;
        std::cout << "Monoisotopic mass: "
            << chemicalGroupIt->second.monoisotopicMass()
            << std::endl;
        std::cout << std::endl;
    }

    return 0;
}
from pyteomics import biolccc 

for label, chemicalGroup in biolccc.rpAcnFaRod['chemicalGroups'].items():
    print 'Name', chemicalGroup['name']
    print 'Label', chemicalGroup['label']
    print 'Bind energy', chemicalGroup['bindEnergy']
    print 'Average mass', chemicalGroup['averageMass']
    print 'Monoisotopic mass', chemicalGroup['monoisotopicMass']
    print ''

print 'More simple syntax:'
print biolccc.rpAcnFaRod

«  Installation   ::   Contents   ::   Advanced pyteomics.biolccc usage  »