The high-level API

The U1DB API has three separate sections: document storage and retrieval, querying, and sync. Here we describe the high-level API. Remember that you will need to choose an implementation, and exactly how this API is defined is implementation-specific, in order that it fits with the language’s conventions.

Document storage and retrieval

U1DB stores documents. A document is a set of nested key-values; basically, anything you can express with JSON. Implementations are likely to provide a Document object “wrapper” for these documents; exactly how the wrapper works is implementation-defined.

Creating documents

To create a document, use create_doc() or create_doc_from_json(). Code examples below are from The reference implementation in Python. create_doc() takes a dictionary-like object, and create_doc_from_json() a JSON string.

>>> import u1db
>>> db = u1db.open("mydb1.u1db", create=True)
>>> doc = db.create_doc({"key": "value"}, doc_id="testdoc")
>>> doc.content
{'key': 'value'}
>>> doc.doc_id
'testdoc'

Retrieving documents

The simplest way to retrieve documents from a u1db is by calling get_doc() with a doc_id. This will return a Document object [1].

>>> import u1db
>>> db = u1db.open("mydb4.u1db", create=True)
>>> doc = db.create_doc({"key": "value"}, doc_id="testdoc")
>>> doc1 = db.get_doc("testdoc")
>>> doc1.content
{u'key': u'value'}
>>> doc1.doc_id
'testdoc'

And it’s also possible to retrieve many documents by doc_id.

>>> import u1db
>>> db = u1db.open("mydb5.u1db", create=True)
>>> doc1 = db.create_doc({"key": "value"}, doc_id="testdoc1")
>>> doc2 = db.create_doc({"key": "value"}, doc_id="testdoc2")
>>> for doc in db.get_docs(["testdoc2","testdoc1"]):
...     print doc.doc_id
testdoc2
testdoc1

Note that u1db.Database.get_docs() returns the documents in the order specified.

Editing existing documents

Editing an existing document is done with put_doc(). This is separate from create_doc() so as to avoid accidental overwrites. put_doc() takes a Document object, because the object encapsulates revision information for a particular document. This revision information must match what is stored in the database, so we can make sure you are not overwriting another version of the document that you dont know about (eg, new documents that came from a background sync while you were editing your copy).

>>> import u1db
>>> db = u1db.open("mydb2.u1db", create=True)
>>> doc1 = db.create_doc({"key1": "value1"}, doc_id="doc1")

>>> # the next line should fail because it's creating a doc that already exists
>>> db.create_doc({"key1fail": "value1fail"}, doc_id="doc1")
Traceback (most recent call last):
    ...
RevisionConflict

>>> # Now editing the doc with the doc object we got back...
>>> doc1.content["key1"] = "edited"
>>> db.put_doc(doc1) 
'...'
>>> doc2 = db.get_doc(doc1.doc_id)
>>> doc2.content
{u'key1': u'edited'}

Finally, deleting a document is done with delete_doc().

>>> import u1db
>>> db = u1db.open("mydb3.u1db", create=True)
>>> doc = db.create_doc({"key": "value"})
>>> db.delete_doc(doc) 
'...'
>>> db.get_doc(doc.doc_id)
>>> doc = db.get_doc(doc.doc_id, include_deleted=True)
>>> doc.content

Querying

To retrieve documents other than by doc_id, you query the database. Querying a U1DB is done by means of an index. To retrieve only some documents from the database based on certain criteria, you must first create an index, and then query that index.

An index is created from ‘’index expressions’‘. An index expression names one or more fields in the document. A simple example follows: view many more examples here.

Given a database with the following documents:

>>> import u1db
>>> db1 = u1db.open("mydb6.u1db", create=True)
>>> jb = db1.create_doc({"firstname": "John", "surname": "Barnes", "position": "left wing"})
>>> jm = db1.create_doc({"firstname": "Jan", "surname": "Molby", "position": "midfield"})
>>> ah = db1.create_doc({"firstname": "Alan", "surname": "Hansen", "position": "defence"})
>>> jw = db1.create_doc({"firstname": "John", "surname": "Wayne", "position": "filmstar"})

an index expression of "firstname" will create an index that looks (conceptually) like this

index expression value document
Alan ah
Jan jm
John jb
John jw

and that index is created with:

>>> db1.create_index("by-firstname", "firstname")
>>> sorted(db1.get_index_keys('by-firstname'))
[(u'Alan',), (u'Jan',), (u'John',)]

– that is, create an index with a name and one or more index expressions. (Exactly how to pass the name and the list of index expressions is something specific to each implementation.)

Index expressions

An index expression describes how to get data from a document; you can think of it as describing a function which, when given a document, returns a value, which is then used as the index key.

Name a field. A basic index expression is a dot-delimited list of nesting fieldnames, so the index expression field.sub1.sub2 applied to a document with below content:

>>> import u1db
>>> db = u1db.open('mydb7.u1db', create=True)
>>> db.create_index('by-subfield', 'field.sub1.sub2')
>>> doc1 = db.create_doc({"field": {"sub1": {"sub2": "hello", "sub3": "not selected"}}})
>>> db.get_index_keys('by-subfield')
[(u'hello',)]

gives the index key “hello”, and therefore an entry in the index of

Index key doc
hello doc1

Name a list. If an index expression names a field whose contents is a list of strings, the document will have multiple entries in the index, one per entry in the list. So, the index expression field.tags applied to a document with content:

>>> import u1db
>>> db = u1db.open('mydb8.u1db', create=True)
>>> db.create_index('by-tags', 'field.tags')
>>> doc2 = db.create_doc({"field": {"tags": [ "tag1", "tag2", "tag3" ]}})
>>> sorted(db.get_index_keys('by-tags'))
[(u'tag1',), (u'tag2',), (u'tag3',)]

gives index entries

Index key doc
tag1 doc2
tag2 doc2
tag3 doc2

Subfields of objects in a list. If an index expression points at subfields of objects in a list, the document will have multiple entries in the index, one for each object in the list that specifies the denoted subfield. For instance the index expression managers.phone_number applied to a document with content:

>>> import u1db
>>> db = u1db.open('mydb9.u1db', create=True)
>>> db.create_index('by-phone-number', 'managers.phone_number')
>>> doc3 = db.create_doc(
...    {"department": "department of redundancy department",
...    "managers": [
...        {"name": "Mary", "phone_number": "12345"},
...        {"name": "Katherine"},
...        {"name": "Rob", "phone_number": "54321"}]})
>>> sorted(db.get_index_keys('by-phone-number'))
[(u'12345',), (u'54321',)]

would give index entries:

Index key doc
12345 doc3
54321 doc3

Transformation functions. An index expression may be wrapped in any number of transformation functions. A function transforms the result of the contained index expression: for example, if an expression name.firstname generates “John” when applied to a document, then lower(name.firstname) generates “john”.

Available transformation functions are:

  • lower(index_expression) - lowercase the value
  • split_words(index_expression) - split the value on whitespace; will act like a list and add multiple entries to the index
  • number(index_expression, width) - takes an integer value, and turns it into a string, left padded with zeroes, to make it at least as wide as width; or nothing if the field type is not an integer.
  • bool(index_expression) - takes a boolean value and turns it into ‘0’ if false and ‘1’ if true, or nothing if the field type is not boolean.
  • combine(index_expression1, index_expression2, ...) - Combine the values of an arbitrary number of sub expressions into a single index.

So, the index expression splitwords(lower(field.name)) applied to a document with content:

>>> import u1db
>>> db = u1db.open('mydb10.u1db', create=True)
>>> db.create_index('by-split-lower', 'split_words(lower(field.name))')
>>> doc4 = db.create_doc({"field": {"name": "Bruce David Grobbelaar"}})
>>> sorted(db.get_index_keys('by-split-lower'))
[(u'bruce',), (u'david',), (u'grobbelaar',)]

gives index entries

Index key doc
bruce doc3
david doc3
grobbelaar doc3

Querying an index

Pass an index key or a tuple of index keys (if the index is on multiple fields) to get_from_index; the last index key in each tuple (and only the last one) can end with an asterisk, which matches initial substrings. So, querying our by-firstname index from above:

>>> johns = [d.doc_id for d in db1.get_from_index("by-firstname", "John")]
>>> assert(jw.doc_id in johns)
>>> assert(jb.doc_id in johns)
>>> assert(jm.doc_id not in johns)

will return the documents with ids: ‘jw’, ‘jb’.

get_from_index("by_firstname", "J*") will match all index keys beginning with “J”, and so will return the documents with ids: ‘jw’, ‘jb’, ‘jm’.

>>> js = [d.doc_id for d in db1.get_from_index("by-firstname", "J*")]
>>> assert(jw.doc_id in js)
>>> assert(jb.doc_id in js)
>>> assert(jm.doc_id in js)

Synchronising

U1DB is a syncable database. Any U1DB can be synced with any U1DB server; most U1DB implementations are capable of being run as a server. Synchronising brings both the server and the client up to date with one another; save data into a local U1DB whether online or offline, and then sync when online.

Pass an HTTP URL to sync with that server.

Synchronising databases which have been independently changed may produce conflicts. Read about the U1DB conflict policy and more about synchronising at Conflicts, Synchronisation, and Revisions.

Running your own U1DB server is implementation-specific. The reference implementation is able to be run as a server.

Dealing with conflicts

Synchronising a database can result in conflicts; if your user changes the same document in two different places and then syncs again, that document will be ‘’in conflict’‘, meaning that it has incompatible changes. If this is the case, has_conflicts will be true, and put_doc to a conflicted doc will give a ConflictedDoc error. To get a list of conflicted versions of the document, do get_doc_conflicts(). Deciding what the final unconflicted document should look like is obviously specific to the user’s application; once decided, call resolve_doc() to resolve and set the final resolved content.

Synchronising Functions

footnotes

[1]Alternatively if a factory function was passed into u1db.open(), get_doc() will return whatever type of object the factory function returns.