glySpace Client¶
This module implements a client for communicating with remote data stores part of the glySpace project. Currently, this communicates with the SPARQL endpoint hosted by https://glytoucan.org/.
The module contains a pre-created instance of GlySpaceRDFClient
named client
whose
get()
, structure()
, from_taxon()
, and structures_with_motif()
methods
are available as top-level functions of the module.
RDF Namespaces that are pre-created¶
NSGlyTouCan = Namespace("http://www.glytoucan.org/glyco/owl/glytoucan#")
NSGlycan = Namespace("http://purl.jp/bio/12/glyco/glycan#")
NSGlycoinfo = Namespace("http://rdf.glycoinfo.org/glycan/")
NSGlycomeDB = Namespace("http://rdf.glycome-db.org/glycan/")
NSSKOS = Namespace("http://www.w3.org/2004/02/skos/core#")
NSUniprotCore = Namespace("http://purl.uniprot.org/core/")
NSUniprotEntity = Namespace("http://purl.uniprot.org/uniprot/")
NSTaxonomy = Namespace("http://purl.uniprot.org/taxonomy/")
-
class
glypy.io.glyspace.
GlySpaceRDFClient
[source]¶ Bases:
glypy.io.glyspace.RDFClientBase
An RDF Client for glySpace. The default namespace is
glycoinfo
, and the following namespaces are bound:glytoucan = NSGlyTouCan glycomedb = NSGlycomeDB glycan = NSGlycan glycoinfo = NSGlycoinfo skos = NSSKOS
Attributes
predicate_processor_map: ChainFunctionDict
A dictionary keeping track of the chain of processors registered for each predicate. _sparql_endpoint_uri: str The web address to use as the remote backend for SPARQL queries. Passed on to rdflib.ConjunctiveGraph
and theSPARQLStore
storage plugin.-
get
(uriref, simplify=True)¶ Download all related information for
uriref
from the remote data source.Collects all the triples from the remote data source where
uriref
is the subject. Ifuriref
is not the subject of any triples, it is re-queried as a predicate, storing the subject-object pairs.Any objects (and subjects) which are themselves
rdflib.term.URIRef
instances will be converted intoBoundURIRef
which will silently fetch the relevant entity from the remote source.If the predicate matches a processor rules, instead of it’s object value being stored, the object will be transformed by each rule in the processor chain.
Parameters: uriref: str or rdflib.term.URIRef
A subject or predicate.
simplify: bool, optional
If true, any predicate with a single value will be a scalar, and any other will be a list.
Returns: ReferenceEntity
An object representing the subject whose attributes are named after predicates with their objects as values.
-
query
(query_object, processor='sparql', result='sparql', initNs=None, initBindings=None, use_store_provided=True, **kwargs)¶ Query this graph.
A type of ‘prepared queries’ can be realised by providing initial variable bindings with initBindings
Initial namespaces are used to resolve prefixes used in the query, if none are given, the namespaces from the graph’s namespace manager are used.
Returntype: rdflib.query.QueryResult
-
__getitem__
(item)¶ A graph can be “sliced” as a shortcut for the triples method The python slice syntax is (ab)used for specifying triples. A generator over matches is returned, the returned tuples include only the parts not given
>>> import rdflib >>> g = rdflib.Graph() >>> g.add((rdflib.URIRef('urn:bob'), rdflib.RDFS.label, rdflib.Literal('Bob')))
>>> list(g[rdflib.URIRef('urn:bob')]) # all triples about bob [(rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#label'), rdflib.term.Literal(u'Bob'))]
>>> list(g[:rdflib.RDFS.label]) # all label triples [(rdflib.term.URIRef(u'urn:bob'), rdflib.term.Literal(u'Bob'))]
>>> list(g[::rdflib.Literal('Bob')]) # all triples with bob as object [(rdflib.term.URIRef(u'urn:bob'), rdflib.term.URIRef(u'http://www.w3.org/2000/01/rdf-schema#label'))]
Combined with SPARQL paths, more complex queries can be written concisely:
Name of all Bobs friends:
g[bob : FOAF.knows/FOAF.name ]
Some label for Bob:
g[bob : DC.title|FOAF.name|RDFS.label]
All friends and friends of friends of Bob
g[bob : FOAF.knows * ‘+’]
etc.
New in version 4.0.
-
triples
(triple_or_quad, context=None)¶ Iterate over all the triples in the entire conjunctive graph
For legacy reasons, this can take the context to query either as a fourth element of the quad, or as the explicit context keyword paramater. The kw param takes precedence.
-
from_taxon
(taxon, limit=None)[source]¶ Fetch all accession numbers for all structures from the given taxonomic identifier, up to
limit
records.Equivalent to the following SPARQL
SELECT DISTINCT ?saccharide WHERE { ?saccharide a glycan:saccharide . ?saccharide skos:exactMatch ?gdb . ?gdb glycan:has_reference ?ref . ?ref glycan:is_from_source ?source . ?source glycan:has_taxon ?taxon FILTER REGEX(str(?taxon), "http://www.uniprot.org/taxonomy/<taxonomy-accession>.rdf") }
The REGEX filter is used because at current taxonomic information in Glycome-DB is encoded as a string instead of a URI.
Parameters: taxon : str or int
A string or number which corresponds to the taxonomy database id for the taxon of interest.
limit : int, optional
The maximum number of results to retrieve.
Returns: list of BoundURIRef
-
structures_with_motif
(motif, limit=None)[source]¶ Fetch all accession numbers and structures for all structures which contain the given motif accession, up to
limit
records.Equivalent to the following SPARQL
SELECT DISTINCT ?saccharide ?glycoct WHERE { ?saccharide a glycan:saccharide . ?saccharide glycan:has_glycosequence ?sequence . FILTER CONTAINS(str(?sequence), "glycoct") . ?sequence glycan:has_sequence ?glycoct . ?saccharide glycan:has_motif <motif-accession> }
Parameters: motif : str
The accession number of the motif of interest.
limit : int, optional
The maximum number of results to retrieve.
Returns: list of
ReferenceEntity
-
-
class
glypy.io.glyspace.
RDFClientBase
(sparql_endpoint, accession_ns, cache_size=100)[source]¶ Bases:
rdflib.graph.ConjunctiveGraph
-
accession_to_uriref
(accession)[source]¶ Utility method to translate free strings into full URIs derived from this instance’s
accession_ns
Parameters: accession : str
A regular string comprised of just the accession number of an entity.
Returns: rdflib.term.URIRef
-
get
(uriref, simplify=True)[source]¶ Download all related information for
uriref
from the remote data source.Collects all the triples from the remote data source where
uriref
is the subject. Ifuriref
is not the subject of any triples, it is re-queried as a predicate, storing the subject-object pairs.Any objects (and subjects) which are themselves
rdflib.term.URIRef
instances will be converted intoBoundURIRef
which will silently fetch the relevant entity from the remote source.If the predicate matches a processor rules, instead of it’s object value being stored, the object will be transformed by each rule in the processor chain.
Parameters: uriref: str or rdflib.term.URIRef
A subject or predicate.
simplify: bool, optional
If true, any predicate with a single value will be a scalar, and any other will be a list.
Returns: ReferenceEntity
An object representing the subject whose attributes are named after predicates with their objects as values.
-
classmethod
register_predicate_processor
(predicate)[source]¶ Decorator to register a callable processor for a
URIRef
predicate
with this type’spredicate_processor_map
. The actual decorated callable is returned unchanged.Parameters: predicate : rdflib.term.URIRef or str
The type of URI to add the decorated callable to the processor chain of
Returns: callable
-
-
class
glypy.io.glyspace.
ChainFunctionDict
(*args, **kwargs)[source]¶ Bases:
collections.defaultdict
A wrapper around
defaultdict(list)
which keys onrdflib.term.URIRef
strings. Added values should be callables which will be invoked in the order given on an entity statedict
and eachobject
which is linked by the keypredicate
.
-
class
glypy.io.glyspace.
BoundURIRef
[source]¶ Bases:
rdflib.term.URIRef
A subclass of
rdflib.term.URIRef
which bakes in a way to fetch the referenced subgraph (in the semantic web sense) as aReferenceEntity
by keeping a reference (in the memory address sense) to theGlySpaceRDFClient
instance which fetched the object it is attached to.It has some convenience features for interactive use such as implict resource fetching when checking for attribute completions.
Attributes
_bind_source: RDFClientBase
_result_ref: ReferenceEntity
A reference to the ReferenceEntity fetched from this URI. Acts as a cache -
__call__
(simplify=True, refresh=False)¶ Get the referenced entity either from
_bind_source
or the cached reference in_result_ref
Parameters: simplify : bool, optional
As in
RDFClientBase.get()
refresh : bool, optional
If
True
, always request the URI’s semantic reference, ignoring_result_ref
Returns: ReferenceEntity
-
__eq__
(other)[source]¶ Overrides the equality method of
rdflib.term.URIRef
which does exacttype()
comparison before comparing contents to short-circuit on non-URIs.
-
__getattr__
(name)[source]¶ A convenience method to forward missed attribute lookup to the referenced entity. Calls
__call__()
, which may initiate a network request.
-
get
(simplify=True, refresh=False)[source]¶ Get the referenced entity either from
_bind_source
or the cached reference in_result_ref
Parameters: simplify : bool, optional
As in
RDFClientBase.get()
refresh : bool, optional
If
True
, always request the URI’s semantic reference, ignoring_result_ref
Returns: ReferenceEntity
-
-
class
glypy.io.glyspace.
ReferenceEntity
(uriref, **kwargs)[source]¶ Bases:
object
A ReferenceEntity is a generic type to for storing results from a semantic query. Its attribute names are usually derived from predicates, and their values may be scalar or lists. It is usually constructed from
get()
from aBoundURIRef
instance.A preprocessor may add new attributes to a ReferenceEntity during construction, such as
structure_
with aGlycan
instance value when the referenced entity satisfiesglycan:has_glycosequence
.