glySpace Client

This module implements a client for communicating with remote data stores part of the glySpace project. Currently, this communicates with the SPARQL endpoint hosted by https://glytoucan.org/.

The module contains a pre-created instance of GlySpaceRDFClient named client whose get(), structure(), from_taxon(), and structures_with_motif() methods are available as top-level functions of the module.

RDF Namespaces that are pre-created

NSGlyTouCan = Namespace("http://www.glytoucan.org/glyco/owl/glytoucan#")
NSGlycan = Namespace("http://purl.jp/bio/12/glyco/glycan#")
NSGlycoinfo = Namespace("http://rdf.glycoinfo.org/glycan/")
NSGlycomeDB = Namespace("http://rdf.glycome-db.org/glycan/")
NSSKOS = Namespace("http://www.w3.org/2004/02/skos/core#")
NSUniprotCore = Namespace("http://purl.uniprot.org/core/")
NSUniprotEntity = Namespace("http://purl.uniprot.org/uniprot/")
NSTaxonomy = Namespace("http://purl.uniprot.org/taxonomy/")
class glypy.io.glyspace.GlySpaceRDFClient[source]

Bases: glypy.io.glyspace.RDFClientBase

An RDF Client for glySpace. The default namespace is glycoinfo, and the following namespaces are bound:

glytoucan = NSGlyTouCan
glycomedb = NSGlycomeDB
glycan = NSGlycan
glycoinfo = NSGlycoinfo
skos = NSSKOS

Attributes

predicate_processor_map: ChainFunctionDict A dictionary keeping track of the chain of processors registered for each predicate.
_sparql_endpoint_uri: str The web address to use as the remote backend for SPARQL queries. Passed on to rdflib.ConjunctiveGraph and the SPARQLStore storage plugin.
get(uriref, simplify=True)

Download all related information for uriref from the remote data source.

Collects all the triples from the remote data source where uriref is the subject. If uriref is not the subject of any triples, it is re-queried as a predicate, storing the subject-object pairs.

Any objects (and subjects) which are themselves rdflib.term.URIRef instances will be converted into BoundURIRef which will silently fetch the relevant entity from the remote source.

If the predicate matches a processor rules, instead of it’s object value being stored, the object will be transformed by each rule in the processor chain.

Parameters:

uriref: str or rdflib.term.URIRef

A subject or predicate.

simplify: bool, optional

If true, any predicate with a single value will be a scalar, and any other will be a list.

Returns:

ReferenceEntity

An object representing the subject whose attributes are named after predicates with their objects as values.

query(query_object, processor='sparql', result='sparql', initNs=None, initBindings=None, use_store_provided=True, **kwargs)

Query this graph.

A type of ‘prepared queries’ can be realised by providing initial variable bindings with initBindings

Initial namespaces are used to resolve prefixes used in the query, if none are given, the namespaces from the graph’s namespace manager are used.

Returntype:rdflib.query.QueryResult
__getitem__(item)

A graph can be “sliced” as a shortcut for the triples method The python slice syntax is (ab)used for specifying triples. A generator over matches is returned, the returned tuples include only the parts not given

>>> import rdflib
>>> g = rdflib.Graph()
>>> g.add((rdflib.URIRef('urn:bob'), rdflib.RDFS.label, rdflib.Literal('Bob')))
>>> list(g[rdflib.URIRef('urn:bob')]) # all triples about bob
[(rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#label'), rdflib.term.Literal(u'Bob'))]
>>> list(g[:rdflib.RDFS.label]) # all label triples
[(rdflib.term.URIRef(u'urn:bob'), rdflib.term.Literal(u'Bob'))]
>>> list(g[::rdflib.Literal('Bob')]) # all triples with bob as object
[(rdflib.term.URIRef(u'urn:bob'), rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#label'))]

Combined with SPARQL paths, more complex queries can be written concisely:

Name of all Bobs friends:

g[bob : FOAF.knows/FOAF.name ]

Some label for Bob:

g[bob : DC.title|FOAF.name|RDFS.label]

All friends and friends of friends of Bob

g[bob : FOAF.knows * ‘+’]

etc.

New in version 4.0.

triples(triple_or_quad, context=None)

Iterate over all the triples in the entire conjunctive graph

For legacy reasons, this can take the context to query either as a fourth element of the quad, or as the explicit context keyword paramater. The kw param takes precedence.

from_taxon(taxon, limit=None)[source]

Fetch all accession numbers for all structures from the given taxonomic identifier, up to limit records.

Equivalent to the following SPARQL

SELECT DISTINCT ?saccharide WHERE {
    ?saccharide a glycan:saccharide .
    ?saccharide skos:exactMatch ?gdb .
    ?gdb glycan:has_reference ?ref .
    ?ref glycan:is_from_source ?source .
    ?source glycan:has_taxon ?taxon
    FILTER REGEX(str(?taxon), "http://www.uniprot.org/taxonomy/<taxonomy-accession>.rdf")
}

The REGEX filter is used because at current taxonomic information in Glycome-DB is encoded as a string instead of a URI.

Parameters:

taxon : str or int

A string or number which corresponds to the taxonomy database id for the taxon of interest.

limit : int, optional

The maximum number of results to retrieve.

Returns:

list of BoundURIRef

structures_with_motif(motif, limit=None)[source]

Fetch all accession numbers and structures for all structures which contain the given motif accession, up to limit records.

Equivalent to the following SPARQL

SELECT DISTINCT ?saccharide ?glycoct WHERE {
    ?saccharide a glycan:saccharide .
    ?saccharide glycan:has_glycosequence ?sequence .
    FILTER CONTAINS(str(?sequence), "glycoct") .
    ?sequence glycan:has_sequence ?glycoct .
    ?saccharide glycan:has_motif <motif-accession>
}
Parameters:

motif : str

The accession number of the motif of interest.

limit : int, optional

The maximum number of results to retrieve.

Returns:

list of ReferenceEntity

class glypy.io.glyspace.RDFClientBase(sparql_endpoint, accession_ns, cache_size=100)[source]

Bases: rdflib.graph.ConjunctiveGraph

accession_to_uriref(accession)[source]

Utility method to translate free strings into full URIs derived from this instance’s accession_ns

Parameters:

accession : str

A regular string comprised of just the accession number of an entity.

Returns:

rdflib.term.URIRef

get(uriref, simplify=True)[source]

Download all related information for uriref from the remote data source.

Collects all the triples from the remote data source where uriref is the subject. If uriref is not the subject of any triples, it is re-queried as a predicate, storing the subject-object pairs.

Any objects (and subjects) which are themselves rdflib.term.URIRef instances will be converted into BoundURIRef which will silently fetch the relevant entity from the remote source.

If the predicate matches a processor rules, instead of it’s object value being stored, the object will be transformed by each rule in the processor chain.

Parameters:

uriref: str or rdflib.term.URIRef

A subject or predicate.

simplify: bool, optional

If true, any predicate with a single value will be a scalar, and any other will be a list.

Returns:

ReferenceEntity

An object representing the subject whose attributes are named after predicates with their objects as values.

classmethod register_predicate_processor(predicate)[source]

Decorator to register a callable processor for a URIRef predicate with this type’s predicate_processor_map. The actual decorated callable is returned unchanged.

Parameters:

predicate : rdflib.term.URIRef or str

The type of URI to add the decorated callable to the processor chain of

Returns:

callable

class glypy.io.glyspace.ChainFunctionDict(*args, **kwargs)[source]

Bases: collections.defaultdict

A wrapper around defaultdict(list) which keys on rdflib.term.URIRef strings. Added values should be callables which will be invoked in the order given on an entity state dict and each object which is linked by the key predicate.

class glypy.io.glyspace.BoundURIRef[source]

Bases: rdflib.term.URIRef

A subclass of rdflib.term.URIRef which bakes in a way to fetch the referenced subgraph (in the semantic web sense) as a ReferenceEntity by keeping a reference (in the memory address sense) to the GlySpaceRDFClient instance which fetched the object it is attached to.

It has some convenience features for interactive use such as implict resource fetching when checking for attribute completions.

Attributes

_bind_source: RDFClientBase  
_result_ref: ReferenceEntity A reference to the ReferenceEntity fetched from this URI. Acts as a cache
__call__(simplify=True, refresh=False)

Get the referenced entity either from _bind_source or the cached reference in _result_ref

Parameters:

simplify : bool, optional

refresh : bool, optional

If True, always request the URI’s semantic reference, ignoring _result_ref

Returns:

ReferenceEntity

__eq__(other)[source]

Overrides the equality method of rdflib.term.URIRef which does exact type() comparison before comparing contents to short-circuit on non-URIs.

__getattr__(name)[source]

A convenience method to forward missed attribute lookup to the referenced entity. Calls __call__(), which may initiate a network request.

get(simplify=True, refresh=False)[source]

Get the referenced entity either from _bind_source or the cached reference in _result_ref

Parameters:

simplify : bool, optional

refresh : bool, optional

If True, always request the URI’s semantic reference, ignoring _result_ref

Returns:

ReferenceEntity

class glypy.io.glyspace.ReferenceEntity(uriref, **kwargs)[source]

Bases: object

A ReferenceEntity is a generic type to for storing results from a semantic query. Its attribute names are usually derived from predicates, and their values may be scalar or lists. It is usually constructed from get() from a BoundURIRef instance.

A preprocessor may add new attributes to a ReferenceEntity during construction, such as structure_ with a Glycan instance value when the referenced entity satisfies glycan:has_glycosequence.