glypy Quickstart
----------------

The basic building block of a glycan is a monosaccharide.
Monosaccharides are an extremely diverse family of molecules, but their
features are relatively consistent across these variations.

The object-model in ``glypy`` is designed to let you specify aspects of
a monosaccharide without placing each carbon by hand, but still allowing
you full control over the gross structure of the object.

.. code:: python

    import glypy
    monosaccharide = glypy.Monosaccharide(anomer='beta', superclass='hex', configuration='d', stem='glc', ring_start=1, ring_end=5)
    print(monosaccharide)
    monosaccharide.mass()

.. parsed-literal::

    RES 1b:b-dglc-HEX-1:5
    



.. parsed-literal::

    180.0633881022



Behold,
`Glucose <http://www.monosaccharidedb.org/display_monosaccharide.action?id=4>`__.
If you follow the link to MonosaccharideDB, you'll see that the mass
reported is the monoisotopic mass of the monosaccharide.

By default, all sugars are printed out in *condensed GlycoCT* format.

Here, we've made a monosaccharide whose anomeric carbon is in the
:math:`\beta` configuration, whose backbone contains six carbons,
forming a right-handed stereo-structure, with the ring topology of
Glucose, starting at carbon 1 and closing at carbon 5.

That was a lot to write for just a molecule of glucose. For convenience,
you can also create new common monosaccharides by:

.. code:: python

    glypy.monosaccharides['Glucose']



.. parsed-literal::

    RES 1b:b-dglc-HEX-1:5



There is a long list of available names available, some of which are
synonyms.

.. code:: python

    print(glypy.monosaccharides["Glc"])
    print('--')
    print(glypy.monosaccharides["Glucose"])
    print('--')
    print(glypy.monosaccharides["bdGlc"])

.. parsed-literal::

    RES 1b:b-dglc-HEX-1:5
    --
    RES 1b:b-dglc-HEX-1:5
    --
    RES 1b:b-dglc-HEX-1:5
    

Because there are so many names for the same thing, it can sometimes be
useful to consult a synonym index or an identity test if you happen upon
a name you do not recognize

.. code:: python

    from glypy.io.nomenclature import synonyms, identity
    
    print(synonyms.monosaccharides['Glc'])
    identity.is_a(monosaccharide, "Glucose")

.. parsed-literal::

    [u'Glucose', u'bdGlc', u'LinearCode:G']
    



.. parsed-literal::

    True



Putting Pieces Together
~~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

    monosaccharides = glypy.monosaccharides
    
    glcnac1 = monosaccharides["GlcNAc"]
    glcnac2 = monosaccharides["GlcNAc"]
    
    glcnac1.add_monosaccharide(glcnac2, position=4)
    bdman = monosaccharides["bdMan"]
    glcnac2.add_monosaccharide(bdman, position=4)
    adman1 = monosaccharides["adMan"]
    bdman.add_monosaccharide(adman1, position=3)
    adman2 = monosaccharides["adMan"]
    bdman.add_monosaccharide(adman2, position=6)
    
    n_linked_core = glypy.Glycan(root=glcnac1)
    print(n_linked_core)
    print(n_linked_core.mass())
    print(n_linked_core.total_composition())

.. parsed-literal::

    RES
    1b:b-dglc-HEX-1:5
    2s:n-acetyl
    3b:b-dglc-HEX-1:5
    4s:n-acetyl
    5b:b-dman-HEX-1:5
    6b:a-dman-HEX-1:5
    7b:a-dman-HEX-1:5
    LIN
    1:1d(2+1)2n
    2:1o(4+-1)3d
    3:3d(2+1)4n
    4:3o(4+-1)5d
    5:5o(3+-1)7d
    6:5o(6+-1)6d
    
    910.327779978
    Composition({'H': 58, 'C': 34, 'O': 26, 'N': 2})
    

What we did here created each monosaccharide unit that composes the core
of an N-linked glycan, and connected them together to form a tree or
graph structure. Each bond links a parent residue to one of its children
in the language of trees, where the first parent is the residue at the
reducing end.

At the end, in order to create something aware of the entire tree
structure and not just an individual node in it, we created an instance
of the ``Glycan`` class from the first residue, the root of the tree.

To see what we made here, we can look at both a textual representation
and an image of the tree

.. code:: python

    from glypy.plot import plot
    %matplotlib inline
    
    plot(n_linked_core)
    
    print(n_linked_core)

.. parsed-literal::

    RES
    1b:b-dglc-HEX-1:5
    2s:n-acetyl
    3b:b-dglc-HEX-1:5
    4s:n-acetyl
    5b:b-dman-HEX-1:5
    6b:a-dman-HEX-1:5
    7b:a-dman-HEX-1:5
    LIN
    1:1d(2+1)2n
    2:1o(4+-1)3d
    3:3d(2+1)4n
    4:3o(4+-1)5d
    5:5o(3+-1)7d
    6:5o(6+-1)6d
    
    


.. image:: output_11_1.png


The process of putting together all of those building blocks can be
repetative as well, so the N-glycan core structure is made available in
``glypy.glycans`` just as ``glypy.monosaccharides`` provided a
shortcut for creating individual monosaccharides

.. code:: python

    premade_core = glypy.glycans['N-Linked Core']
    premade_core == n_linked_core



.. parsed-literal::

    True



Operating on Glycans
~~~~~~~~~~~~~~~~~~~~

The real use for this library comes from manipulating glycan structure.
We can add and remove monosaccharides, substituents, and modifications
per base, derivatize the whole glycan structure, and generate fragments.

Basic operations such as iteration traverse the glycan structure using a
depth-first traversal that selects branches by bond order.

.. code:: python

    accumulated_total_mass = 0
    for monosaccharide in n_linked_core:
        print(monosaccharide)
        print(monosaccharide.mass())
        accumulated_total_mass += monosaccharide.mass()
        print('-------------')
    
    print(accumulated_total_mass, n_linked_core.mass())

.. parsed-literal::

    RES 1b:b-dglc-HEX-1:5 2s:n-acetyl LIN 1:1d(2+1)2n
    220.082112171
    -------------
    RES 1b:b-dglc-HEX-1:5 2s:n-acetyl LIN 1:1d(2+1)2n
    203.07937252
    -------------
    RES 1b:b-dman-HEX-1:5
    161.044998386
    -------------
    RES 1b:a-dman-HEX-1:5
    163.060648451
    -------------
    RES 1b:a-dman-HEX-1:5
    163.060648451
    -------------
    (910.32777997822, 910.32777997822)
    

This shows each distinct monosaccharide unit composing the total
structure and their discrete masses after composition loss from forming
glycosidic bonds. This total mass is the same as mass of the glycan.

We can traverse the structure in many other ways using the ``iternodes``
function, which is called by the `Python iterator
protocol <https://docs.python.org/2/library/stdtypes.html#iterator-types>`__,
``__iter__``. A feature worth pointing out is that the ``iternodes``
function allows you to pass a callback to influence how it emits
residues. For instance, ``Glycan.leaves()`` is implemented similar to
this

.. code:: python

    import itertools
    
    def leaves(self):
        def is_leaf(obj):
            if len(list(obj.children())) == 0:
                yield obj
    
        return itertools.chain.from_iterable(self.iternodes(apply_fn=is_leaf))
        
    print("The leaf nodes, or `terminal nodes` of the N-linked core")
    for leaf in leaves(n_linked_core):
        print(leaf)
        print('---')

.. parsed-literal::

    The leaf nodes, or `terminal nodes` of the N-linked core
    RES 1b:a-dman-HEX-1:5
    ---
    RES 1b:a-dman-HEX-1:5
    ---
    

As you can see, if you can create a sequence of only nodes that satisfy
some constraint, such as number of child nodes, it is easy to build an
iterator pipeline to perform more complex graph transformations. You can
also iterate over each link in the glycan using ``Glycan.iterlinks()``.
This is useful if you want to perform an operation on each ``Link``
object.

Like breaking them.

Glycan Fragmentation
^^^^^^^^^^^^^^^^^^^^

A core goal of ``glypy`` is supporting flexible glycan fragmentation.
An example of this operation with the N-linked core:

.. code:: python

    for fragment_ion in n_linked_core.fragments():
        print(fragment_ion)

.. parsed-literal::

    <Fragment kind=Y link_ids=[1] included_nodes=[1] mass=221.089937203>
    <Fragment kind=B link_ids=[1] included_nodes=[2, 3, 4, 5] mass=689.237842775>
    <Fragment kind=Y link_ids=[2] included_nodes=[2, 1] mass=424.169309723>
    <Fragment kind=B link_ids=[2] included_nodes=[3, 4, 5] mass=486.158470255>
    <Fragment kind=Y link_ids=[3] included_nodes=[3, 2, 1, 4] mass=748.27495656>
    <Fragment kind=B link_ids=[3] included_nodes=[5] mass=162.052823418>
    <Fragment kind=Y link_ids=[4] included_nodes=[3, 2, 1, 5] mass=748.27495656>
    <Fragment kind=B link_ids=[4] included_nodes=[4] mass=162.052823418>
    

To connect a fragment to its place in the original structure, you can
retrieve the disjoint sub-trees from a fragmentation by using the
function ``fragment_to_substructure`` in the ``glycan`` module.

.. code:: python

    from matplotlib import pyplot as plt
    from glypy.structure.glycan import fragment_to_substructure
    
    for links, frags in itertools.groupby(n_linked_core.clone().fragments(), lambda f: f.link_ids):
        
        y_ion, b_ion = frags
        y_tree, b_tree = [fragment_to_substructure(frag, n_linked_core) for frag in (y_ion, b_ion)]
        
        fig, axes = plt.subplots(1,2)
        b_ax, y_ax = axes
        fig.set_size_inches(16, 4)
        plot(y_tree, ax=y_ax, center=True)
        plot(b_tree, ax=b_ax, center=True)
        y_ax.set_title(n_linked_core.name_fragment(y_ion))
        b_ax.set_title(n_linked_core.name_fragment(b_ion))
        fig.suptitle("Fragments from breaking bond {}".format(links))
    
        


.. image:: output_21_0.png



.. image:: output_21_1.png



.. image:: output_21_2.png



.. image:: output_21_3.png


The ``Glycan.fragments()`` algorithm mutates the object, iteratively as
removing and restoring links between the constituent ``Monosaccharide``
objects. Because we want to use the object's structure for
``fragment_to_substructure``, it is important to copy the object before
fragmenting it. All of ``glypy``'s common structures,
``Glycan, Monosaccharide, Link, Substituent,`` and ``ReducedEnd`` have a
``clone`` method.

This is all useful, but what we probably want something more complex
than just the core structure to work with. GlycomeDB is the most
up-to-date (at the time of this writing) functional Glycomics database.
Taking `Entry
183 <http://www.glycome-db.org/database/showStructure.action?glycomeId=183>`__,
we can get something a little more complicated.

It would be a nuisance to have to manually create each residue, and as
we've already seen, ``glypy`` knows about ``GlycoCT``

.. code:: python

    from glypy.io import glycoct
    
    glycomedb183_glycoct = '''
    RES
    1b:b-dglc-HEX-1:5
    2s:n-acetyl
    3b:b-dglc-HEX-1:5
    4s:n-acetyl
    5b:b-dman-HEX-1:5
    6b:a-dman-HEX-1:5
    7b:b-dglc-HEX-1:5
    8s:n-acetyl
    9b:b-dgal-HEX-1:5
    10b:b-dglc-HEX-1:5
    11s:n-acetyl
    12b:b-dgal-HEX-1:5
    13b:a-dman-HEX-1:5
    14b:b-dglc-HEX-1:5
    15s:n-acetyl
    16b:b-dgal-HEX-1:5
    LIN
    1:1d(2+1)2n
    2:1o(4+1)3d
    3:3d(2+1)4n
    4:3o(4+1)5d
    5:5o(3+1)6d
    6:6o(2+1)7d
    7:7d(2+1)8n
    8:7o(4+1)9d
    9:6o(4+1)10d
    10:10d(2+1)11n
    11:10o(4+1)12d
    12:5o(6+1)13d
    13:13o(2+1)14d
    14:14d(2+1)15n
    15:14o(4+1)16d
    '''
    
    glycomedb183 = glycoct.loads(glycomedb183_glycoct)
    plot(glycomedb183)
    glycomedb183



.. parsed-literal::

    RES
    1b:b-dglc-HEX-1:5
    2s:n-acetyl
    3b:b-dglc-HEX-1:5
    4s:n-acetyl
    5b:b-dman-HEX-1:5
    6b:a-dman-HEX-1:5
    7b:b-dglc-HEX-1:5
    8s:n-acetyl
    9b:b-dgal-HEX-1:5
    10b:b-dglc-HEX-1:5
    11s:n-acetyl
    12b:b-dgal-HEX-1:5
    13b:a-dman-HEX-1:5
    14b:b-dglc-HEX-1:5
    15s:n-acetyl
    16b:b-dgal-HEX-1:5
    LIN
    1:1d(2+1)2n
    2:1o(4+1)3d
    3:3d(2+1)4n
    4:3o(4+1)5d
    5:5o(3+1)6d
    6:5o(6+1)13d
    7:6o(2+1)10d
    8:6o(4+1)7d
    9:7d(2+1)8n
    10:7o(4+1)9d
    11:10d(2+1)11n
    12:10o(4+1)12d
    13:13o(2+1)14d
    14:14d(2+1)15n
    15:14o(4+1)16d




.. image:: output_24_1.png


.. code:: python

    for links, frags in itertools.groupby(glycomedb183.clone().fragments(), lambda f: f.link_ids):
        
        y_ion, b_ion = frags
        y_tree, b_tree = [fragment_to_substructure(frag, glycomedb183) for frag in (y_ion, b_ion)]
        
        fig, axes = plt.subplots(1,2)
        b_ax, y_ax = axes
        fig.set_size_inches(16, 4)
        plot(y_tree, ax=y_ax, center=True)
        plot(b_tree, ax=b_ax, center=True)
        y_ax.set_title(glycomedb183.name_fragment(y_ion))
        b_ax.set_title(glycomedb183.name_fragment(b_ion))
        fig.suptitle("Fragments from breaking bond {}".format(links))
    
        


.. image:: output_25_0.png



.. image:: output_25_1.png



.. image:: output_25_2.png



.. image:: output_25_3.png



.. image:: output_25_4.png



.. image:: output_25_5.png



.. image:: output_25_6.png



.. image:: output_25_7.png



.. image:: output_25_8.png



.. image:: output_25_9.png


Derivatization
~~~~~~~~~~~~~~

So this structure is fancier than our last case, but we're probably not
studying it in its native state. Permethylation is a popular technique
for glycan analysis in glycan mass spectrometry.

.. code:: python

    from glypy.composition.composition_transform import derivatize
    
    case = glycomedb183.clone()
    print("Clone is identical? {}".format(case == glycomedb183))
    
    derivatize(case, 'methyl')
    print("Clone is identical? {}".format(case == glycomedb183))
    
    print("Derivatized Mass: {}, Native Mass: {}, Mass Delta: {}".format(
            case.mass(), glycomedb183.mass(), case.mass() - glycomedb183.mass()))
    

.. parsed-literal::

    Clone is identical? True
    Clone is identical? False
    Derivatized Mass: 2496.27212004, Native Mass: 2005.72436779, Mass Delta: 490.547752245