Embedded Glycan Database

We often want to work with large amounts of data, sometimes larger than is convenient to store in main memory. Python ships with builtin support for sqlite, available in the standard library. To ensure these features are available to all users, a simple Object Relational Mapping for the sqlite3 module. The ORM system is built on top of a single class, GlycanRecord which can be used to represent a collection of data that is partially exposed to sqlite3‘s search engine through mapping functions.

For a more sophisticated ORM, please see http://www.sqlalchemy.org/

class glypy.algorithms.database.GlycanRecord(structure, motifs=None, dbxref=None, aglycones=None, taxa=None, **kwargs)[source]

Bases: glypy.algorithms.database.GlycanRecordBase

An extension of GlycanRecordBase to add additional features and better support for extension by both metaprogramming and inheritance.

__eq__(other)

Equality testing is done between structure

add_column_data(name, dtype, transform)

Function-based approach to modifying the class-specific __column_data_map attribute. Must use getattr() to prevent class-private name mangling

Parameters:

name: str

Name of the new column_data field

dtype: str

The SQL data type to encode the column as

transform: function

The function to extract the value of the column_data from a record

classmethod add_index(*args, **kwargs)[source]

Generate the base table’s indices for fast search

Yields:

str:

The SQL script block describing the mass_index of the GlycanRecord table

from_sql(row, *args, **kwargs)

Translate a Row object from sqlite3 into a GlycanRecord object

Parameters:

row: sqlite3.Row

A dict-like object containing the pickled value of the record in the structure field

Returns:

GlycanRecord:

The unpickled GlycanRecord object. Sub-classes may perform more complex operations like decompressing or joining other tables in the database.

is_n_glycan

Returns True if structure has the N-linked Glycan core motif.

Note

This property is mapped to the the database column is_n_glycan by is_n_glycan()

mass(average=False, charge=0, mass_data=None, override=None)

Calculates the mass of structure. If override is not None, return this instead.

monosaccharides

Returns a mapping of the counts of monosaccharides found in structure. Generic names are found using naive_name_monosaccharide().

Note

This property is mapped to the the database column composition by extract_composition().

See also

extract_composition(), naive_name_monosaccharide()

class glypy.algorithms.database.RecordDatabase(connection_string=':memory:', record_type=<class 'glypy.algorithms.database.GlycanRecord'>, records=None, flag='c')[source]

Bases: object

A wrapper around an Sqlite3 database for storing and searching GlycanRecord objects.

This class defines a handful general data access methods as well as the ability to directly write SQL queries against the database.

Calls apply_schema(). If records is not None, calls apply_indices().

record_type, the stored class type is used for inferring the table schema, table name, and GlycanRecord.from_sql() function.

If records is not provided, no records are added. If records are provided, they are inserted with load_data() and afterwards apply_indices() is called.

Parameters:

connection_string: str

The path to the Sqlite database file, or the ”:memory:” special keyword defining the database to be held directly in memory.

record_type: type

The class type of the records assumed to be stored in this database. Defaults to GlycanRecord

records: list

A list of record_type records to insert immediately on table creation.

Attributes

connection_string: str The path to the Sqlite database file, or the ”:memory:” special keyword defining the database to be held directly in memory.
record_type: type The class type of the records assumed to be stored in this database. The stored class type is used for inferring the table schema, table name, and GlycanRecord.from_sql() function.
__getitem__(keys)[source]

Look up records in the database by primary key. Also accepts slice objects.

Returns:record_type or list of record_type
__iter__()[source]

Iterate sequentially over each entry in the database.

__len__()[source]

The number of records in the database

Returns:int
_patch_querymethods()[source]

Patch on additional methods defined on record_type

apply_indices()[source]

Executes each SQL block yielded by record_type‘s add_index() class method. Commits all pending changes.

May be called during initialization if data was added.

apply_schema()[source]

Executes each SQL block yielded by record_type‘s sql_schema() class method. Commits all pending changes.

Called during initialization if the database is newly created.

Danger

The SQL table definition statements generated may drop existing tables. Calling this function on an already populated table can cause data loss.

commit()[source]

A wrapper around sqlite3.Connection.commit(). Writes pending changes to the database.

create(structure, *args, **kwargs)[source]

A convenience function for creating a database entry for Glycan structure. Passes along all arguments to record_type initialization methods.

Parameters:

structure: :class:`Glycan`

commit: bool

If True, commit changes after adding this record. Defaults to True.

execute(query, *args, **kwargs)[source]

A wrapper around sqlite3.Connection.execute(). Will format the query string to substitute in the main table name if the {table_name} token is present

executemany(query, param_iter, *args, **kwargs)[source]

A wrapper around sqlite3.Connection.executemany(). Will format the query string to substitute in the main table name if the {table_name} token is present.

executescript(script, *args, **kwargs)[source]

A wrapper around sqlite3.Connection.executescript().

from_sql(rows, from_sql_fn=None)[source]

Convenience function to convert rows into objects through from_sql_fn, by default, self.record_type.from_sql()

Parameters:

rows : sqlite3.Row or an iterable of sqlite3.Row

Collection of objects to convert

from_sql_fn : function, optional

Function to perform the conversion. Defaults to self.record_type.from_sql()

Yields:

Type returned by from_sql_fn

get_metadata(key=None)[source]

Retrieve a value from the key-value store in the database’s metadata table.

If key is None all of the keys will be retrieved and returned as a dict.

Parameters:

key : str, optional

The key value to retrieve.

Returns:

any or dict

load_data(record_list, commit=True, set_id=True, cast=True, **kwargs)[source]

Given an iterable of record_type objects, assign each a primary key value and insert them into the database.

Forwards all **kwargs to to_sql() calls.

Parameters:

record_list: GlycanRecord or iterable of GlycanRecords

commit: bool

Whether or not to commit all changes to the database

set_id: bool

cast: bool

Rapidly search the database for entries with a recorded mass within tolerance parts per million mass error of mass.

\([mass - (tolerance * mass), mass + (tolerance * mass)]\)

rollback()[source]

A wrapper around sqlite3.Connection.rollback(). Reverses the last set of changes to the database.

set_metadata(key, value)[source]

Set a key-value pair in the database’s metadata table

Parameters:

key : str

Key naming value

value : any

Value to store in the metadata table

stn(string, key='table_name')

A convenience function called to substitute in the primary table name into raw SQL queries. By default it looks for the token {table_name}.

sub_table_name(string, key='table_name')[source]

A convenience function called to substitute in the primary table name into raw SQL queries. By default it looks for the token {table_name}.

glypy.algorithms.database._resolve_column_data_mro(cls)[source]

Given a class with __column_data_map mangled attributes from its class hierarchy, extract in descending order each dict, overwriting old settings as it approaches the most recently descended class.

Parameters:

cls: type

The type to attempt to extract column_data mappings for along the MRO

Returns:

dict:

The column_data mapping describing the entire class hierarchy along cls‘s MRO.

glypy.algorithms.database.column_data(name, dtype, transform)[source]

Decorator for adding a new column to the SQL table mapped to a record class

Parameters:

name: str

Name of the new column

dtype: str

The SQL data type to encode the column as

transform: function

The function to extract the value of the column from a record

Returns:

function:

Decorator that will call add_column_data() with name, dtype and transform on the decorated class after instantiation

glypy.algorithms.database.dbopen

Open a database

alias of RecordDatabase

glypy.algorithms.database.extract_composition(record, max_size=120)[source]

Given a GlycanRecord, translate its monosaccharides property into a string suitable for denormalized querying.

Transforms the resulting, e.g. Counter({u’GlcNA’: 6, u’Gal’: 4, u’aMan’: 2, u’Fuc’: 1, u’Man’: 1}) into the string “Gal:4 aMan:2 Fuc:1 GlcNA:6 Man:1” which could be partially matched in queries using SQL’s LIKE operator.

Parameters:

record: GlycanRecord

The record to serialize monosaccharides for

max_size: int

The maximum size of the resulting string allowed under the target SQL schema

Returns:

str:

The string representation of record.monosaccharides.

glypy.algorithms.database.is_n_glycan(record)[source]

A predicate testing if the N-linked Glycan core motif is present in record.structure.

Returns:

int:

0 if False, 1 if True. Sqlite doesn’t have a dedicated bool data type.

glypy.algorithms.database.querymethod(func)[source]

Decorator for creating patching methods onto databases

Parameters:

func : method

Method to be bound

Returns:

QueryMethod