Home | Trees | Indices | Help |
---|
|
Conventions used:
From http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html version 0.1
The Pairtree specification on which this implementation is based is (c) 2009 UC Regents.
Various regexes used in path to id conversion and the bulk of the unittests were contributed by Erik Hetzner, based on John Kunze's work, also (c) 2009 UC Regents and released under the Apache license.
A ppath script is included for convenience to be used in shell scripts or similar. Eg:
ppath topath
examples:
$ vim mystore/pairtree_root/`ppath topath document:105/data/doc.txt` (Opens the file at mystore/pairtree_root/do/cu/me/nt/+1/05/data/doc.txt) $ cp `ppath topath foo:bar/1.txt` `ppath topath bar:foo/2.txt`
ppath toid
examples:
data/subjects/pairtree_root/HA/SS/ET/ROOT$ ppath toid `pwd` HASSET/ROOT
>>> from pairtree import *
>>> # Get the store 'factory' >>> f = PairtreeStorageFactory()
The factory object is solely there to create clients for individual pairtree stores. For example:
>>> store_foo = f.get_store(store_dir="data", uri_base="http://")
This will create the following on disc in a directory called 'data' if it doesn't already exist:
$ ls -R data/ data/: pairtree_prefix pairtree_root pairtree_version0_1 data/pairtree_root:
Where
This directory conforms to Pairtree Version 0.1. Updated spec: http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html
Note, if data *had* already existed and was a pairtree store, the uri_base would have been read from the prefix file and override the one supplied above.
Also, if you try to create a store over a directory that already exists, but which isn't a pairtree store that it can recognise, it will raise a NotAPairtreeStoreException.
Valid store names fit the regex ^[A-z][A-z0-9]* - but this is an arbitrary limitation and can be removed if it is seen as unnecessary.
Two main commands for this activity, eg continuing on:
>>> bar = store_foo.create_object('bar') >>>
Note that reissuing that command again will raise an Exception:
>>> bar = store_foo.create_object('bar') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "build/bdist.linux-i686/egg/pairtree/pairtree_client.py", line 235, in create_object pairtree.storage_exceptions.ObjectAlreadyExistsException
There is also a 'get_object' command, which is more accommodating, as it can be passed a fairly self-explanatory flag, which by default will create the object if it doesn't exist:
get_object(self, id=None, create_if_doesnt_exist=True)
>>> bar = store_foo.get_object('bar')
Setting this flag to False, will cause it to raise an exception if it cannot find an object.
>>> fake = store_foo.get_object('doesnotexist', create_if_doesnt_exist=False) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "build/bdist.linux-i686/egg/pairtree/pairtree_client.py", line 231, in get_object pairtree.storage_exceptions.ObjectNotFoundException
(note that fake = store_foo.get_object('doesnotexist', False) is equivalent to the above line)
The important methods:
E.g. - Examples speak louder than words
>>> bar.add_bytestream('foo.txt', 'can be any sequence of bytes') >>> bar.list_parts() ['foo.txt'] >>>
Adding buffered content from a file:
>>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream: ... bar.add_bytestream('Firefox_wallpaper.png', stream) ... >>>
Adding the same file to magic/path/inside/object - paths are automatically created on demand.
>>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream: ... bar.add_bytestream('Firefox_wallpaper.png', stream, path='magic/path/inside/object') ... >>>
Removing the first copy of that file, which was added to the wrong place:
>>> bar.del_file('Firefox_wallpaper.png') >>> bar.list_parts() ['magic', 'foo.txt'] >>> bar.list_parts('magic/path') ['inside'] >>> bar.list_parts('magic/path/inside/object') ['Firefox_wallpaper.png'] >>>
There are also some convenience methods:
The by_path suffix means that you can give it the whole path as one, and it will try to figure out what is intended, for example, consider the png we placed in a nested directory earlier:
>>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream: ... bar.add_bytestream('Firefox_wallpaper.png', stream, path='magic/path/inside/object') ...
This can be written as:
>>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream: ... bar.add_bytestream_by_path('magic/path/inside/object/Firefox_wallpaper.png', stream) ...
The flag streamable is key here - if this is set to True, then you will be passed a file handle, which you must remember to close or use the construct:
>>> with bar.get_bytestream('foo.txt', streamable=True) as text: ... print text.read() ... >>>
This is very useful for large files you wish to scan through, but do not need to hold in memory all at the same time.
By setting streamable to False, the entire file is read into memory and returned:
>>> print bar.get_bytestream('foo.txt') can be any sequence of bytes
Version: 0.5.2
|
|||
|
|
|||
|
|||
|
|
|||
__package__ =
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Wed Jun 2 18:26:53 2010 | http://epydoc.sourceforge.net |