Object Description Mapper ========================= When working with SQL databases, it is common to use what is known as an Object-Relational Mapper that gives you constructs in a high level object-oriented language that persist their data in a relational database. By analogy, in ORDF we have an *Object Description Mapper*. This bears a little explaining. Information is structured in RDF in terms of triples. Higher level constructs are built on top of these using various vocabularies, the most basic of which being `RDFS`_. RDFS provides basic things like a way to specify domain and range restrictions on predicates and a way to construct a class hierarchy. `OWL` build upon RDFS and provides ways to express more complex relationships between classes. The previous state of the art in ORM-like things for RDF tries to model things at the basic RDF level, with no particular support for RDFS or OWL. Also common is a disregard for handling of *graphs* which, as an otherwise unconstrained container for triples has no analogue in the RDBMS world. Building upon a modified version of `InfixOWL`_, ORDF tries to bridge this gap. Because that which is expressed by OWL has the characer of a `Description Logic`_, the result is called an *Object Description Mapper*. It provides constructs that will seem familiar to those used to ORMs like `SQLAlchemy`_ or `Django`_ but which are backed by an RDF store containing multiple graphs and are ultimately expressed in OWL. Individuals ----------- The basic building blocks are contained in the :mod:`ordf.vocab.owl` module. A typical pattern is to declare a subclass of :class:`ordf.vocab.owl.AnnotatibleTerms` and set some properties on it using convenience methods. For example: .. code-block:: python ## preamble from ordf.namespace import register_ns, Namespace register_ns("data", Namespace("http://example.org/data#")) register_ns("ont", Namespace("http://example.org/ontology#")) ## imports required for the example from ordf.namespace import DATA, ONT, FOAF from ordf.term import URIRef from ordf.vocab.owl import AnnotatibleTerms, predicate ## define a python class for individuals of type FOAF.Person class Person(AnnotatibleTerms): def __init__(self, *av, **kw): super(Person, self).__init__(*av, **kw) self.type = FOAF.Person name = predicate(FOAF.name) homepage = predicate(FOAF.homepage) ## create a person bob = Person(DATA.bob) bob.name = "Bob" bob.homepage = URIRef("http://example.org/") ## see what the RDF looks like print bob.graph.serialize(format="n3") As might be expected, the result of doing this is a graph like the following:: @prefix data: . @prefix foaf: . @prefix rdf: . data:bob a foaf:Person; foaf:homepage ; foaf:name "Bob". This is not so different from your standard ORM. However we haven't seen persistence yet. The *Person()* constructor will take an optional (but recommended) argument called *graph*. You need to specify this to say where the information should be stored. By default a transient in-memory graph is created if none is specified, but something a bit more realistic might be: .. code-block:: python from ordf.graph import Graph from rdflib.store.Sleepycat import Sleepycat store = Sleepycat("/tmp/test_db") graph = Graph(store, identifier="http://example.org/bob.rdf") bob = Person(DATA.bob, graph=graph) ... as above Now any statements that are made about *bob* will be persisted in the specified store so that next time, we just have to initialise the store and construct the *bob* instance of *Person* and all the data will be there. Notice the use of :class:`ordf.vocab.owl.predicate` to assign class variables. It is one of a family of classes. In total we have: * :class:`ordf.vocab.owl.predicate` a simple predicate whose object may be any kind of term. * :class:`ordf.vocab.owl.cached_predicate` same as above but the results of getting the value are cached. * :class:`ordf.vocab.owl.object_predicate` a predicate whose object is another individual. * :class:`ordf.vocab.owl.cached_object_predicate` same as above but with caching. The use of :class:`ordf.vocab.owl.object_predicate` makes relations that are analogous to foreign keys in a SQL ORM. Here's an example. .. code-block:: python from ordf.graph import Graph from ordf.vocab.owl import AnnotatibleTerms, object_predicate class Country(AnnotatibleTerms): def __init__(self, *av, **kw): super(Country, self).__init__(*av, **kw) self.type = ONT.Country class City(AnnotatibleTerms): def __init__(self, *av, **kw): super(City, self).__init__(*av, **kw) self.type = ONT.City country = object_predicate(ONT.country, Country) data = Graph() scotland = Country(DATA.scotland, graph=data) scotland.label = "Scotland" edinburgh = City(DATA.edinburgh, graph=data) edinburgh.label = "Edinburgh" edinburgh.country = scotland for country in edinburgh.country: print type(country), [str(x) for x in country.label] output:: ['Scotland'] Two things are important to note here. When you access a instance property that has been made with one of the *predicate* classes, you always get a generator or list back. This is because there is no way to know how many objects have been assigned with the predicate in question. That's just the way it is with RDF. The *Subject, Predicate, Object* model can always be thought of as *Entity, Attribute, list of Values*. The second thing to note is that we have said nothing about the nature of the OWL classes that we are dealing with nor about the predicates. Nowhere have we written down anything about the domains and ranges involved or other restrictions or class hierarchy. There is no description logic embedded in these examples so far. The next section talks more about this. Classes in OWL -------------- The main purpose of `InfixOWL`_ is to express a `Description Logic`_, the relationships between OWL classes and the nature of the predicates in question. There is a bit of a nomenclature collision here. The word *class* is used both in python and in OWL to refer to a concept that, though it is related, is for practical purposes quite different. In particular, we can define a python class, using :class:`ordf.vocab.owl.Class` but to create the OWL class, we need to instantiate it. We need to instantiate it because instantiating it means putting triples in a graph, and there is no such graph available at the time the python class is made. Sure we could do some magic with metaclasses, but it is probably best to keep things simple and keep the magic out of it. Continuing on from the previous example, where we created some individuals that are part of the OWL classes *ont:Country* and *ont:City*, we can start defining our ontology: .. code-block:: python from ordf.vocab.owl import Class class PlaceClass(Class): def __init__(self, **kw): super(PlaceClass, self).__init__(ONT.Place, **kw) class CountryClass(Class): def __init__(self, **kw): super(CountryClass, self).__init__(ONT.Country, **kw) self.subClassOf = PlaceClass(graph=self.graph) self.factoryClass = Country self.label = "Country" class CityClass(Class): def __init__(self, **kw): super(CityClass, self).__init__(ONT.City, **kw) self.subClassOf = PlaceClass(graph=self.graph) self.factoryClass = City self.label = "City" ontology = Graph() country = CountryClass(graph=ontology, factoryGraph=data) city = CityClass(graph=ontology, factoryGraph=data) print [str(x) for x in country.get(DATA.scotland).label] A common convention is to separate out the ontology and the concrete individuals into separate graphs. For this reason there is a settable instance attribute (or keyword argument to the constructor) on instances of :class:`ordf.vocab.owl.Class` called *factoryGraph*. Instances of :class:`ordf.vocab.owl.Class` have methods called *get* and *get_or_create* that are used as simple ways of obtaining individuals of that class from the factory graph. At this stage we have two graphs containing data that looks like this. The ontology graph:: @prefix ont: . @prefix owl: . @prefix rdf: . @prefix rdfs: . ont:City a owl:Class; rdfs:label "City"; rdfs:subClassOf ont:Place. ont:Country a owl:Class; rdfs:label "Country"; rdfs:subClassOf ont:Place. ont:Place a owl:Class. The data graph:: @prefix data: . @prefix ont: . @prefix rdf: . @prefix rdfs: . data:edinburgh a ont:City; rdfs:label "Edinburgh"; ont:country data:scotland. data:scotland a ont:Country; rdfs:label "Scotland". Queries and Filters ------------------- Let's add another country to the data graph, .. code-block:: python russia = Country(DATA.russia, graph=data) russia.label = "Russia" We've seen how the :meth:`ordf.vocab.owl.Class.get` method works, but what if we want to retrieve something where we don't know the subject URI? We can do this if we have `Telescope`_ installed. Unfortunately this only works at the moment with older (2.4.X) rdflib. .. code-block:: python from ordf.namespace import RDFS from ordf.term import Literal from telescope import v for c in country.filter((v.id, RDFS.label, Literal("Russia"))): print [str(x) for x in c.label] If you would rather have the SPARQL query that is used to select individuals by identifier, you can use the :meth:`ordf.vocab.owl.Class.query` method instead of :meth:`ordf.vocab.owl.Class.filter`: .. code-block:: python q = country.query((v.id, RDFS.label, Literal("Russia"))) print q.compile() will produce output like:: SELECT DISTINCT ?id WHERE { ?id . ?id "Russia" } Being able to access the query directly is useful if you are working with a back-end triple store (e.g. `4store`_) that might be able to process SPARQL queries faster than going through RDFLib. You can then do something like, .. code-block:: python from py4s import FourStore from ordf.graph import ConjunctiveGraph store = FourStore("somekb,soft_limit=-1") data = ConjunctiveGraph(store) country = CountryClass(graph=ontology, factoryGraph=data) q = country.query((v.id, RDFS.label, Literal("Russia"))) results = [country.get(ident) for ident, in store.query(q.compile())] This is particularly helpful for complex queries involving unions or large potential result sets where processing the *DISTINCT* operator with RDFLib might be prohibitively expensive. Description Logic ----------------- Given the definitions above, we might expect that we can find places in a generic way but unfortunately, .. code-block:: python >>> places = PlaceClass(graph=ontology, factoryGraph=data) >>> places.get(DATA.edinburgh) Traceback (most recent call last): raise KeyError("no individual %s with type %s" % (identifier, self.identifier)) KeyError: u'no individual http://example.org/edinburgh with type http://example.org/Place' With a little bit of preparation, however, this can be made to work. There are two strategies for doing this. The first, and simplest, is called forward chaining. What this means is that all facts that can be inferred from the ontology and data are added to the data graph: .. code-block:: python from FuXi.Rete.RuleStore import SetupRuleStore from FuXi.Rete.Util import generateTokenSet rule_store, rule_graph, network = SetupRuleStore(makeNetwork=True) network.reset(data) network.setupDescriptionLogicProgramming(ontology) network.feedFactsToAdd(generateTokenSet(data)) assert places.get(DATA.scotland) == scotland assert places.get(DATA.edinburgh) == edinburgh Mind that this inferencing step can be processor intensive as well as data intensive if the ontology is complex and there are lots of facts that can be inferred. The call to *network.reset()* causes the Rete network to add inferred facts to the data graph. the *setupDescriptionLogicProgramming* transforms the ontology into Horn rules and the *feedFactsToAdd* passes all of the known facts that exist in the data graph through the network. The result is that *data* now contains:: @prefix data: . @prefix ont: . @prefix rdf: . @prefix rdfs: . data:edinburgh a ont:City, ont:Place; rdfs:label "Edinburgh"; ont:country data:scotland. data:scotland a ont:Country, ont:Place; rdfs:label "Scotland". data:russia a ont:Country, ont:Place; rdfs:label "Russia". The second strategy is backward chaining and is a little more complex to set up. The advantage is that any required facts that are not explicitly in the data graph are computed on the fly so don't take up extra disk space, and also that the required facts are dependent on the query so irrelevant facts won't be deduced. The disadvantage is greater CPU overhead at query time, and the aforementioned complexity with setting it up. *... To be continued* .. _RDFS: http://www.w3.org/TR/rdf-schema/ .. _OWL: http://www.w3.org/TR/owl-overview/ .. _InfixOWL: http://code.google.com/p/fuxi/wiki/InfixOwl .. _Description Logic: http://en.wikipedia.org/wiki/Description_logic .. _SQLAlchemy: http://www.sqlalchemy.org/ .. _Django: http://www.djangoproject.com/ .. _Telescope: https://bitbucket.org/exogen/telescope .. _4store: http://4store.org/