Getting Started
===============

Please keep in mind that this software is under active development. It is
stabilising with respect to the API interface but there are no guarantees!
We are always happy to hear of the experiences of anyone who tries to get
going with ORDF, stories of success or failure, but reports and suggestions,
are encouraged via the `usual channels`_.

.. _usual channels: http://okfn.org/contact

Installation
------------

ORDF is available from a mercurial repository::

     http://ordf.org/src/

The usual way of installing is to use `virtualenv`_ and `pip`_. Setting up
a basic environment is easy::

    % virtualenv /work
    % . /work/bin/activate
    (ordf)% pip install ordf

If you want to use the `FuXi`_ reasoner and/or RDFLib's SPARQL support
then you should now run::

    (ordf)% pip uninstall rdflib
    (ordf)% pip install rdflib==2.4.2

If you do not need reasoning and want to use another SPARQL implementation
(e.g. `4store`_) then you should be able to run with the neweer 3.0.0
version of rdflib

If you are using 4store, make sure to install the branch that supports
locking from http://github.com/wwaites/4store and the python bindings
from http://github.com/wwaites/py4s

For development you can do this instead of simply installing
the simpler *ordf*::

    (ordf)% pip install mercurial
    (ordf)% pip install -e hg+http://ordf.org/src/#egg=ordf

Once you have done this, you should have the ordf source checked out in
*/work/src/ordf*.

.. _virtualenv: http://pypi.python.org/pypi/virtualenv
.. _pip: http://pip.openplans.org/
.. _RDFLib: http://www.rdflib.net/
.. _FuXi: http://code.google.com/p/fuxi/
.. _4store: http://4store.org/

Running the Tests
-----------------

To run the tests, you have to installed the development branch.
In the source directory */work/src/ordf* do::

    (ordf)% pip install nose
    (ordf)% python setup.py nosetests --verbosity=2 -s

*NOTE*: to run the fourstore tests you need to have a
4store instance serving a kb called ordf_test. To do
this::

    % 4s-backend-setup ordf_test
    % 4s-backend ordf_test

Also make sure to install py4s from http://github.com/wwaites/py4s
This requires at least version 0.8
Also see http://wiki.github.com/wwaites/py4s/installing-py4s

*NOTE*: to run the rabbitmq tests, rabbitmq-server needs to
be running with an exchange ordf_test that can be accessed
by the user guest/guest.

Building Documentation
----------------------

To build this documentation::

    (myenv)% pip install sphinx
    (myenv)% python setup.py build_sphinx

and the documentation will be in */work/src/ordf/build/sphinx/html/index.html*

Configuration
-------------

There are two usual modes of operation for using *ordf*. The first is
to use it as a library in another program, for example a Pylons_ application.
The second is via an included command line program called :program:`ordf`.
There are example configurations in http://ordf.org/src/file/tip/examples/
*simple.ini* is good for getting quickly started with some persistent 
storage.

A Testing Environment
---------------------

A very simple configuration for a Pylons_ application might be to just do
any indexing and saving in the web request processing thread. This might not
scale very well, particularly if adding a document to an index is a time
consuming operation, but it is typical for a development environment. A
fragment of an appropriate *development.ini*::

    ordf.readers = pairtree
    ordf.writers = pairtree,fourstore,xapian

    pairtree.args = %(here)s/data/pairtree
    fourstore.args = somekb
    xapian.args = 127.0.0.1:44332

If you are using the :class:`~ordf.handler.rdf.FourStore` back-end, it is
important to use the `locking 4store branch` and to have the `py4s bindings`
installed. Native `RDFLib`_ storage can also be used by putting *rdflib*
in place of *fourstore* in *ordf.writers* and configuring it with::

    rdflib.args = %(here)s/data/rdflib_sleepycat
    rdflib.store = Sleepycat

The Xapian_ back-end is usually run as a network daemon using the *xapian-tcpsrv*
command. This takes care of marshalling read and write operations to the 
database so that we don't have to do it ourselves. It is possible to run
directly from the filesystem but it is likely that you will experience locking
errors if you try.

A Production Environment
------------------------

A production installation will normally have a message queueing service such
as `RabbitMQ`_ and a front-end interface will be configured to send messages
to it. There will be several back-end storage modules that each listen to a
queue and take any action required whenever a message arrives. Please refer
to the `RabbitMQ documentation`_ for instructions for installing and setting
up the queueing daemon.

It is important to distinguish between back-ends used for reading and for 
writing. For example, a typical configuration fragment from a Pylons_ 
application might be::

    ordf.readers = pairtree,fourstore,xapian
    ordf.writers = rabbit

    pairtree.args = %(here)s/data/pairtree
    fourstore.args = somekb
    xapian.args = 127.0.0.1:44332

    rabbit.hostname = localhost
    rabbit.userid = guest
    rabbit.password = guest
    rabbit.connect.exchange = changes

In this setup, the various back-ends are set-up for read-only operation, but
they are still available to the :class:`~ordf.handler.Handler` singleton in
the application. Any write operations, however, are sent to the message 
queue for processing.

Note that because :class:`~ordf.handler.pt.PairTree` is used for reading it
is expected that the storage is available on the local disk or via NFS or
some other mechanism. If any other indices need access to it and they are
actually running on another host, suitable arrangements will need to be 
made.

Also the situation with respect to :class:`~ordf.handler.rdf.FourStore` 
and :class:`~ordf.handler.xap.Xapian` described above in the context of a
development environment is the same here.

At this point we have a Pylons_ application running and reading information
from the back-ends and any write operations are sitting in a queue waiting
to be processed. For each of the back-ends we need to make a configuration
file and then use the :program:`ordf` program to run them.

Taking the :class:`ordf.handler.rdf.PairTree` back-end first, an appropriate
configuration file for :program:`ordf` might look something like::

    [app:main]

    ordf.handler = ordf.handler.queue:RabbitHandler
    ordf.handler.hostname = localhost
    ordf.handler.userid = guest
    ordf.handler.password = guest
    ordf.connect.queue = pairtree

    ordf.writers = pairtree
    pairtree.args = /some/where/data/pairtree

We would then run :program:`ordf` like so::

   ordf -c pairtree.ini -l /var/log/ordf/pairtree.log

A similar arrangement would be used for the other back-ends, the main difference
being the *ordf.writers* directive and any arguments that the back-end requires.

.. _configuring-inferencing:

Configuring Inferencing
-----------------------

Configuring inferencing is slightly complicated because it normally involves
listening to one message exchange and writing to another. A configuration
file for :program:`ordf` might look like this::

     [app:main]

     ordf.handler = ordf.handler.queue:RabbitHandler
     ordf.handler.hostname = localhost
     ordf.handler.userid = guest
     ordf.handler.password = guest
     ordf.connect.exchange = reason
     ordf.connect.queue = fuxi

     ordf.readers = fuxi,pairtree
     ordf.writers = fuxi,rabbit

     fuxi.args = ordf.vocab.rdfs
     pairtree.args = /some/where/data/pairtree
     rabbit.hostname = localhost
     rabbit.userid = guest
     rabbit.password = guest
     rabbit.connect.exchange = index

This takes a little explaining. There are two exchanges, *reason* and *index*.
When a graph is saved, it is first sent to the *reason* exchange where
:program:`ordf` is listening with this configuration file.

The *fuxi* handler is an instance of :class:`ordf.handler.fuxi.FuXiReasoner` and
expects an already complete store containing one or more changesets and one or
more up-to-date graphs that they modify. The *fuxi.args* is a comma-separated
list of modules that export a :meth:`inference_rules` method that return rules
appropriate to that module. See the :ref:`inference-rules` section of this
manual.

When *fuxi* receives the store in its :meth:`~ordf.handler.fuxi.FuXiReasoner.put`
method it runs a production rule engine on all of the graphs that are not
changesets. It then makes a changeset that contains any new statements it was
able. It prevents the original changes from continuing to the *rabbit* handler
and substitutes the changeset it has made together with the original changes.

In this way, there will normally be two changesets -- the first containing the
original changes and the second containing inferred statements.

It is not a problem that *fuxi* makes a changeset that may be passed to its own
:meth:`~ordf.handler.fuxi.FuXiReasoner.put` method whilst in that method since
it is aware of this and simply returns without recursing and allows the *rabbit*
handler to forward the combined changes to the *index* exchange.

In order to give a richer set of facts to feed the inference engine, *fuxi*
needs access to other graphs that may be referenced by the original ones. For
example, given this rule and data (not bothering with namespaces)::

    { ?x :authorOf ?y } => { ?y :author ?x } .
    :LevTolstoy :authorOf :WarAndPeace .

*fuxi* can be expected to produce the triple::

    :WarAndPeace :author :LevTolstoy .

however the graph containing statements about *:WarAndPeace* may not be included
in the changes. The fact that *ordf.readers* contains *fuxi* and *pairtree* in
that order means that it will first look for the *:WarAndPeace* graph in *fuxi*
and then try *pairtree*.

Examples
--------

In addition to running the :program:`ordf` command line tool to listen to a queue and
update an index, it can be used for pulling a graph from the store, or saving a
graph to the store. The following configuration file can be used in both cases::

    [app:main]

    ordf.readers = pairtree
    ordf.writers = rabbit

    pairtree.args = /some/where/data/pairtree
    rabbit.hostname = localhost
    rabbit.userid = guest
    rabbit.password = guest
    rabbit.connect.exchange = index

This uses the message queueing system but so long as there aren't locking issues
to consider it could just as easily use a list of writers as in the development
environment above.

To retrieve a graph from the network or local filesystem and save it to the store::

   ordf -c cmdline.ini -s -m "import from dbpedia" \
   	 http://dbpedia.org/resource/Margaret_Fuller

To print out the same graph in N3 format::

   ordf -c cmdline.ini -t n3 http://dbpedia.org/resource/Margaret_Fuller

It is possible to use :program:`ordf` to (re)build one or more indices. For
example if there is data in a pairtree index and one decides to add 4store,
a configuration file like this named *mk4s.ini*::

    [app:main]

    ordf.readers = pairtree
    ordf.writers = fourstore

    pairtree.args = /some/where/data/pairtree
    fourstore.args = kbname

can be used with :program:`ordf` run like this::

    ordf -c mk4s.ini --reindex

to populate the new index. Only one reader may be specified in this
circumstance, but any number of writers may be used as usual.

.. _Pylons: http://pylonshq.com/
.. _RabbitMQ: http://www.rabbitmq.com/
.. _RabbitMQ documentation: http://www.rabbitmq.com/documentation.html
.. _Xapian: http://xapian.org/
.. _RDFLib: http://www.rdflib.net/