# Example 2: Fragmentation¶

In this example, we are going to retrieve MS/MS data from an MGF file and compare it to identification info we read from a pepXML file. We are going to compare the MS/MS spectrum in the file with the theoretical spectrum of a peptide assigned to this spectrum by the search engine.

The script source can be downloaded here. We will also need the example MGF file and the example pepXML file, but the script will download them for you.

The MGF file has a single MS/MS spectrum in it. This spectrum is taken from the SwedCAD database of annotated MS/MS spectra. The pepXML file was obtained by running X!Tandem against the MGF file and converting the results to pepXML with the Tandem2XML tool from TPP.

Let’s start with importing the modules.

# This is written in Python 2 for simplicity
# Can be done forward-compatible easily, though
import matplotlib.pyplot as plt
from pyteomics import mgf, pepxml, mass
import os
from urllib import urlretrieve
import pylab


Then we’ll download the files, if needed:

for fname in ('mgf', 'pep.xml'):
if not os.path.isfile('example.' + fname):
urlretrieve('http://packages.python.org/pyteomics/_downloads/example.'
+ fname, 'example.' + fname)


Now it’s time to define the function that will give us m/z of theoretical fragments for a given sequence. We will use pyteomics.mass.fast_mass() to calculate the values. All we need to do is split the sequence at every bond and iterate over possible charges and ion types:

def fragments(peptide, types=('b', 'y'), maxcharge=1):
"""
The function generates all possible m/z for fragments of types
types and of charges from 1 to maxharge.
"""
for i in xrange(1, len(peptide)-1):
for ion_type in types:
for charge in xrange(1, maxcharge+1):
if ion_type[0] in 'abc':
yield mass.fast_mass(
peptide[:i], ion_type=ion_type, charge=charge)
else:
yield mass.fast_mass(
peptide[i:], ion_type=ion_type, charge=charge)


So, the outer loop is over “fragmentation sites”, the next one is over ion types, then over charges, and lastly over two parts of the sequence (C- and N-terminal).

All right, now it’s time to extract the info from the files. We are going to use the with statement syntax, which is not required, but recommended.

with mgf.read('example.mgf') as spectra, pepxml.read('example.pep.xml') as psms:
spectrum = next(spectra)
psm = next(psms)


Now prepare the figure...

pylab.figure()
pylab.title('Theoretical and experimental spectra for '
+ psm['search_hit'][0]['peptide'])
pylab.xlabel('m/z, Th')
pylab.ylabel('Intensity, rel. units')


... plot the real spectrum:

pylab.bar(spectrum['m/z array'], spectrum['intensity array'], width=0.1, linewidth=2,
edgecolor='black')


... calculate and plot the theoretical spectrum, and show everything:

theor_spectrum = list(fragments(psm['search_hit'][0]['peptide'],
maxcharge=psm['assumed_charge']))
pylab.bar(theor_spectrum,
[spectrum['intensity array'].max()]*len(theor_spectrum),
width=0.1, edgecolor='red', alpha=0.7)
pylab.show()


You will see something like this.

That’s it, as you can see, the most intensive peaks in the spectrum are indeed matched by the theoretical spectrum.