Embedded Glycan Database¶
We often want to work with large amounts of data, sometimes larger than is convenient to store in main memory. Python ships with builtin support for sqlite, available in the standard library. To ensure these features are available to all users, a simple Object Relational Mapping for the sqlite3
module. The ORM system is built on top of a single class, GlycanRecord
which can be used to represent a collection of data that is partially exposed to sqlite3
‘s search engine through mapping functions.
For a more sophisticated ORM, please see http://www.sqlalchemy.org/
-
class
glypy.algorithms.database.
GlycanRecord
(structure, motifs=None, dbxref=None, aglycones=None, taxa=None, **kwargs)[source]¶ Bases:
glypy.algorithms.database.GlycanRecordBase
An extension of
GlycanRecordBase
to add additional features and better support for extension by both metaprogramming and inheritance.-
__eq__
(other)¶ Equality testing is done between
structure
-
add_column_data
(name, dtype, transform)¶ Function-based approach to modifying the class-specific
__column_data_map
attribute. Must usegetattr()
to prevent class-private name manglingParameters: name: str
Name of the new column_data field
dtype: str
The SQL data type to encode the column as
transform: function
The function to extract the value of the column_data from a record
-
classmethod
add_index
(*args, **kwargs)[source]¶ Generate the base table’s indices for fast search
Yields: str:
The SQL script block describing the mass_index of the GlycanRecord table
-
from_sql
(row, *args, **kwargs)¶ Translate a Row object from sqlite3 into a GlycanRecord object
Parameters: row: sqlite3.Row
A dict-like object containing the pickled value of the record in the
structure
fieldReturns: GlycanRecord:
The unpickled
GlycanRecord
object. Sub-classes may perform more complex operations like decompressing or joining other tables in the database.
-
is_n_glycan
¶ Returns
True
ifstructure
has the N-linked Glycan core motif.Note
This property is mapped to the the database column
is_n_glycan
byis_n_glycan()
-
mass
(average=False, charge=0, mass_data=None, override=None)¶ Calculates the mass of
structure
. Ifoverride
is notNone
, return this instead.See also
-
monosaccharides
¶ Returns a mapping of the counts of monosaccharides found in
structure
. Generic names are found usingnaive_name_monosaccharide()
.Note
This property is mapped to the the database column
composition
byextract_composition()
.See also
extract_composition()
,naive_name_monosaccharide()
-
-
class
glypy.algorithms.database.
RecordDatabase
(connection_string=':memory:', record_type=<class 'glypy.algorithms.database.GlycanRecord'>, records=None, flag='c')[source]¶ Bases:
object
A wrapper around an Sqlite3 database for storing and searching GlycanRecord objects.
This class defines a handful general data access methods as well as the ability to directly write SQL queries against the database.
Calls
apply_schema()
. Ifrecords
is notNone
, callsapply_indices()
.record_type
, the stored class type is used for inferring the table schema, table name, andGlycanRecord.from_sql()
function.If
records
is not provided, no records are added. If records are provided, they are inserted withload_data()
and afterwardsapply_indices()
is called.Parameters: connection_string: str
The path to the Sqlite database file, or the ”:memory:” special keyword defining the database to be held directly in memory.
record_type: type
The class type of the records assumed to be stored in this database. Defaults to
GlycanRecord
records: list
A list of
record_type
records to insert immediately on table creation.Attributes
connection_string: str
The path to the Sqlite database file, or the ”:memory:” special keyword defining the database to be held directly in memory. record_type: type
The class type of the records assumed to be stored in this database. The stored class type is used for inferring the table schema, table name, and GlycanRecord.from_sql()
function.-
__getitem__
(keys)[source]¶ Look up records in the database by primary key. Also accepts
slice
objects.Returns: record_type
orlist
ofrecord_type
-
apply_indices
()[source]¶ Executes each SQL block yielded by
record_type
‘sadd_index()
class method. Commits all pending changes.May be called during initialization if data was added.
-
apply_schema
()[source]¶ Executes each SQL block yielded by
record_type
‘ssql_schema()
class method. Commits all pending changes.Called during initialization if the database is newly created.
Danger
The SQL table definition statements generated may drop existing tables. Calling this function on an already populated table can cause data loss.
-
commit
()[source]¶ A wrapper around
sqlite3.Connection.commit()
. Writes pending changes to the database.
-
create
(structure, *args, **kwargs)[source]¶ A convenience function for creating a database entry for
Glycan
structure
. Passes along all arguments torecord_type
initialization methods.Parameters: structure: :class:`Glycan`
commit: bool
If
True
, commit changes after adding this record. Defaults toTrue
.
-
execute
(query, *args, **kwargs)[source]¶ A wrapper around
sqlite3.Connection.execute()
. Will format the query string to substitute in the main table name if the {table_name} token is present
-
executemany
(query, param_iter, *args, **kwargs)[source]¶ A wrapper around
sqlite3.Connection.executemany()
. Will format the query string to substitute in the main table name if the {table_name} token is present.
-
executescript
(script, *args, **kwargs)[source]¶ A wrapper around
sqlite3.Connection.executescript()
.
-
from_sql
(rows, from_sql_fn=None)[source]¶ Convenience function to convert
rows
into objects throughfrom_sql_fn
, by default,self.record_type.from_sql()
Parameters: rows : sqlite3.Row or an iterable of sqlite3.Row
Collection of objects to convert
from_sql_fn : function, optional
Function to perform the conversion. Defaults to
self.record_type.from_sql()
Yields: Type returned by
from_sql_fn
-
get_metadata
(key=None)[source]¶ Retrieve a value from the key-value store in the database’s metadata table.
If
key
isNone
all of the keys will be retrieved and returned as adict
.Parameters: key : str, optional
The key value to retrieve.
Returns: any or
dict
-
load_data
(record_list, commit=True, set_id=True, cast=True, **kwargs)[source]¶ Given an iterable of
record_type
objects, assign each a primary key value and insert them into the database.Forwards all
**kwargs
toto_sql()
calls.Parameters: record_list: GlycanRecord or iterable of GlycanRecords
commit: bool
Whether or not to commit all changes to the database
set_id: bool
cast: bool
-
ppm_match_tolerance_search
(mass, tolerance, mass_shift=0)[source]¶ Rapidly search the database for entries with a recorded mass within
tolerance
parts per million mass error ofmass
.[mass−(tolerance∗mass),mass+(tolerance∗mass)]
-
rollback
()[source]¶ A wrapper around
sqlite3.Connection.rollback()
. Reverses the last set of changes to the database.
-
set_metadata
(key, value)[source]¶ Set a key-value pair in the database’s metadata table
Parameters: key : str
Key naming value
value : any
Value to store in the metadata table
-
stn
(string, key='table_name')¶ A convenience function called to substitute in the primary table name into raw SQL queries. By default it looks for the token {table_name}.
-
-
glypy.algorithms.database.
_resolve_column_data_mro
(cls)[source]¶ Given a class with
__column_data_map
mangled attributes from its class hierarchy, extract in descending order eachdict
, overwriting old settings as it approaches the most recently descended class.Parameters: cls: type
The type to attempt to extract column_data mappings for along the MRO
Returns: dict:
The column_data mapping describing the entire class hierarchy along
cls
‘s MRO.
-
glypy.algorithms.database.
column_data
(name, dtype, transform)[source]¶ Decorator for adding a new column to the SQL table mapped to a record class
Parameters: name: str
Name of the new column
dtype: str
The SQL data type to encode the column as
transform: function
The function to extract the value of the column from a record
Returns: function:
Decorator that will call
add_column_data()
withname
,dtype
andtransform
on the decorated class after instantiation
-
glypy.algorithms.database.
dbopen
¶ Open a database
alias of
RecordDatabase
-
glypy.algorithms.database.
extract_composition
(record, max_size=120)[source]¶ Given a
GlycanRecord
, translate itsmonosaccharides
property into a string suitable for denormalized querying.Transforms the resulting, e.g. Counter({u’GlcNA’: 6, u’Gal’: 4, u’aMan’: 2, u’Fuc’: 1, u’Man’: 1}) into the string “Gal:4 aMan:2 Fuc:1 GlcNA:6 Man:1” which could be partially matched in queries using SQL’s LIKE operator.
Parameters: record: GlycanRecord
The record to serialize
monosaccharides
formax_size: int
The maximum size of the resulting string allowed under the target SQL schema
Returns: str:
The string representation of
record.monosaccharides
.