Package pairtree :: Module pairtree_client :: Class PairtreeStorageClient
[hide private]
[frames] | no frames]

Class PairtreeStorageClient

source code


A client that oversees the implementation of the Pairtree FS specification version 0.1.

>>> from pairtree import PairtreeStorageClient
>>> store = PairtreeStorageClient(store_dir='data', uri_base="http://")

This will create the following on disc in a directory called 'data' if it doesn't already exist:

   $ ls -R data/
   data/:
   pairtree_prefix  pairtree_root  pairtree_version0_1

   data/pairtree_root:

Where

  1. the file 'pairtree_prefix' contains just "http://"
  2. the file 'pairtree_version0_1' contains:
      This directory conforms to Pairtree Version 0.1.
      Updated spec: http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html
    

Note, if data *had* already existed and was a pairtree store, the uri_base would have been read from the prefix file and override the one supplied above.

Also, if you try to create a store over a directory that already exists, but which isn't a pairtree store that it can recognise, it will raise a NotAPairtreeStoreException.

Instance Methods [hide private]
 
__init__(self, uri_base, store_dir, shorty_length=2, hashing_type=None)
Constructor
source code
 
__char2hex(self, m) source code
 
__hex2char(self, m) source code
 
id_encode(self, id)
The identifier string is cleaned of characters that are expected to occur rarely in object identifiers but that would cause certain known problems for file systems.
source code
 
id_decode(self, id)
This decodes a given identifier from its pairtree filesystem encoding, into its original form:
source code
 
_get_id_from_dirpath(self, dirpath)
Internal - method for discovering the pairtree identifier for a given directory path.
source code
 
_get_path_from_dirpath(self, dirpath)
Internal - walks a directory chain and builds a list of the directory shorties relative to the pairtree_root
source code
 
_id_to_dirpath(self, id)
Internal - method for turning an identifier into a pairtree directory tree of shorties.
source code
 
_id_to_dir_list(self, id)
Internal - method for turning an identifier into a list of pairtree directory tree of shorties.
source code
 
_init_store(self)
Initialise the store if the directory doesn't exist.
source code
 
list_ids(self)
Walk the store, and build a list of pairtree conformational objects in the store.
source code
 
_create(self, id)
Internal - create an object.
source code
 
list_parts(self, id, path=None)
List all the parts of the given identifer's parts (excluding shortie directories belonging to other objects)
source code
 
isfile(self, id, filepath)
Returns True or False depending on whether the path is a file or not.
source code
 
isdir(self, id, filepath)
Returns True or False depending on whether the path is a subdirectory or not.
source code
 
put_stream(self, id, path, stream_name, bytestream, buffer_size=8192)
Store a stream of bytes into a file within a pairtree object.
source code
 
get_appendable_stream(self, id, path, stream_name)
Reads a filehandle for a pairtree object.
source code
 
get_stream(self, id, path, stream_name, streamable=False)
Reads a file from a pairtree object - If streamable is set to True, this returns the filehandle for that file, which must be close()'d once finished with.
source code
 
del_stream(self, id, stream_name, path=None)
Delete a file from a pairtree object.
source code
 
del_path(self, id, path, recursive=False)
Delete a subpath from an object, and can do so recursively (optional) If the path is found to be not "empty" (ie has not parts in it) and recursive is not True, then it will raise a PathIsNotEmptyException
source code
 
delete_object(self, id)
Delete's an object from the pairtree store, including any parts and subpaths There is no undo...
source code
 
exists(self, id, path=None)
Answers the question "Does object or object subpath/file 'xxxxxxx' exist?"
source code
 
_get_new_id(self)
Inbuilt method to randomly generate an id, if one is not given to either get_object or create_object.
source code
 
get_object(self, id=None, create_if_doesnt_exist=True)
Returns an pairtree object with identifier id if it exists.
source code
 
create_object(self, id)
Creates a new object with identifier id
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, uri_base, store_dir, shorty_length=2, hashing_type=None)
(Constructor)

source code 

Constructor

Parameters:
  • store_dir (A path to a directory, relative or absolute) - The file directory where the pairtree store is
  • uri_base (A URI fragment, like "http://example.org/") - The URI base for the store
  • shorty_length (integer) - The size of the shorties in the pairtree implementation (Default: 2)
  • hashing_type (Any supported by hashlib) - The name of the algorithm to use when hashing files, if left as None, this is disabled.
Overrides: object.__init__

id_encode(self, id)

source code 

The identifier string is cleaned of characters that are expected to occur rarely in object identifiers but that would cause certain known problems for file systems. In this step, every UTF-8 octet outside the range of visible ASCII (94 characters with hexadecimal codes 21-7e) [ASCII] (Cerf, “ASCII format for network interchange,” October 1969.), as well as the following visible ASCII characters:

  "   hex 22           <   hex 3c           ?   hex 3f
  *   hex 2a           =   hex 3d           ^   hex 5e
  +   hex 2b           >   hex 3e           |   hex 7c
  ,   hex 2c

must be converted to their corresponding 3-character hexadecimal encoding, ^hh, where ^ is a circumflex and hh is two hex digits. For example, ' ' (space) is converted to ^20 and '*' to ^2a.

In the second step, the following single-character to single-character conversions must be done:

      / -> =
      : -> +
      . -> ,

These are characters that occur quite commonly in opaque identifiers but present special problems for filesystems. This step avoids requiring them to be hex encoded (hence expanded to three characters), which keeps the typical ppath reasonably short. Here are examples of identifier strings after cleaning and after ppath mapping:

   id:  ark:/13030/xt12t3
       ->  ark+=13030=xt12t3
       ->  ar/k+/=1/30/30/=x/t1/2t/3/
   id:  http://n2t.info/urn:nbn:se:kb:repos-1
       ->  http+==n2t,info=urn+nbn+se+kb+repos-1
       ->  ht/tp/+=/=n/2t/,i/nf/o=/ur/n+/n/bn/+s/e+/kb/+/re/p/os/-1/
   id:  what-the-*@?#!^!?
       ->  what-the-^2a@^3f#!^5e!^3f
       ->  wh/at/-t/he/-^/2a/@^/3f/#!/^5/e!/^3/f/

(From section 3 of the Pairtree specification)

Parameters:
  • id (identifier) - Encode the given identifier according to the pairtree 0.1 specification
Returns:
A string of the encoded identifier

id_decode(self, id)

source code 

This decodes a given identifier from its pairtree filesystem encoding, into its original form:

Parameters:
  • id (identifier) - Identifier to decode
Returns:
A string of the decoded identifier

_get_id_from_dirpath(self, dirpath)

source code 

Internal - method for discovering the pairtree identifier for a given directory path.

E.g. pairtree_root/fo/ob/ar/+/ --> 'foobar:'

Parameters:
  • dirpath (Path to object's root) - Directory path to decode
Returns:
Decoded identifier

_get_path_from_dirpath(self, dirpath)

source code 

Internal - walks a directory chain and builds a list of the directory shorties relative to the pairtree_root

Parameters:
  • dirpath (Directory path) - Directory path to walk

_id_to_dirpath(self, id)

source code 

Internal - method for turning an identifier into a pairtree directory tree of shorties.

  • "foobar://ark.1" --> "fo/ob/ar/+=/ar/k,/1"
Parameters:
  • id (identifier) - Identifer for a pairtree object
Returns:
A directory path to the object's root directory

_id_to_dir_list(self, id)

source code 

Internal - method for turning an identifier into a list of pairtree directory tree of shorties.

  • "foobar://ark.1" --> ["fo","ob","ar","+=","ar","k,","1"]
Parameters:
  • id (identifier) - Identifer for a pairtree object
Returns:
A list of directory path fragments to the object's root directory

_init_store(self)

source code 

Initialise the store if the directory doesn't exist. Create the basic structure needed and write the prefix to disc.

If the store directory exists, one of two things can happen:

  1. If that directory can be understood by this library as a pairtree store, it will attempt to read in the correct pairtree_prefix to use, instead of the supplied uri_base.
  2. If the directory cannot be understood, a NotAPairtreeStoreException will be raised.

list_ids(self)

source code 

Walk the store, and build a list of pairtree conformational objects in the store. This will return objects in 'split-ends' and will function correctly as long as non-shortie directorys are just that; non-shortie directories must have longer labels than the shorties - e.g:

     ab -- cd -- ef -- foo.txt
            |     |
            |     ---- gh
            |           |
            |           ---- foo.txt
            |
            ---- e  -- foo.txt

     This method will return ['abcdef', 'abcde', 'abcdefgh'] as ids in this
     store.

TODO: Need to make sure this corresponds to pairtree spec.

Currently, it ignores the possibility of a split end being 'shielded' by a /obj/ folder

Returns a generator, not a plain list since version 0.4.12

Returns:
generator

_create(self, id)

source code 

Internal - create an object. If the object already exists, raise a ObjectAlreadyExistsException

Parameters:
  • id (identifier) - Identifier to be created
Returns:
PairtreeStorageObject

list_parts(self, id, path=None)

source code 

List all the parts of the given identifer's parts (excluding shortie directories belonging to other objects)

If path is supplied, the parts in that subdirectory are returned.

If the subpath doesn't exist, a ObjectNotFoundException will be raised.

>>> store.list_parts('foobar:1', 'data/images')
[ 'image001.tif', 'image....    ]
Parameters:
  • id (identifier) - Identifier for pairtree object
  • path (Directory path) - (Optional) List the parts contained in path's subdirectory
Returns:
list

isfile(self, id, filepath)

source code 

Returns True or False depending on whether the path is a file or not.

If the file doesn't exist, False is returned.

Parameters:
  • filepath (Directory path) - Path to be tested
Returns:
bool

isdir(self, id, filepath)

source code 

Returns True or False depending on whether the path is a subdirectory or not.

If the path doesn't exist, False is returned.

Parameters:
  • filepath (Directory path) - Path to be tested
Returns:
bool

put_stream(self, id, path, stream_name, bytestream, buffer_size=8192)

source code 

Store a stream of bytes into a file within a pairtree object.

Can be either a string of bytes, or a filelike object which supports bytestream.read(buffer_size) - useful for very large files.

Parameters:
  • id (identifier) - Identifier for the pairtree object to write to
  • path (Directory path) - (Optional) subdirectory path to store file in
  • stream_name (filename) - Name of the file to write to
  • bytestream (string|file) - Either a string or a file-like object to read from
  • buffer_size (integer) - (Optional) Used for streaming filelike objects - defines the size of the buffer to read in each cycle.
Returns:
tuple (hashing_algorithm, hash) or None if hashing is disabled

get_appendable_stream(self, id, path, stream_name)

source code 

Reads a filehandle for a pairtree object. This is a "wb+" opened file and so can be appended to and obeys 'seek'

>>> with store.get_appendable_stream('foobar:1','data/images', 'image001.tif') as stream:
        # Do something with the C{stream} handle
        pass

stream is closed at the end of a with block

Parameters:
  • id (identifier) - Identifier for the pairtree object to read from
  • path (Directory path) - (Optional) subdirectory path to retrieve file from
  • stream_name (filename) - Name of the file to read in
Returns:
file

get_stream(self, id, path, stream_name, streamable=False)

source code 

Reads a file from a pairtree object - If streamable is set to True, this returns the filehandle for that file, which must be close()'d once finished with. In python 2.6 and above, this can be done easily:

>>> with store.get_stream('foobar:1','data/images', 'image001.tif', True) as stream:
        # Do something with the C{stream} handle
        pass

stream is closed at the end of a with block

Parameters:
  • id (identifier) - Identifier for the pairtree object to read from
  • path (Directory path) - (Optional) subdirectory path to retrieve file from
  • stream_name (filename) - Name of the file to read in
  • streamable (True|False) - If True, returns a filelike handle to read() from - remember to close() the file! If False, reads in the file into a bytestring and return that instead.
Returns:
Either file or str

del_stream(self, id, stream_name, path=None)

source code 

Delete a file from a pairtree object. Leaves no trace, be careful.

Parameters:
  • id (identifier) - Identifier for the pairtree object to delete from
  • path (Directory path) - (Optional) subdirectory path to delete file from
  • stream_name (filename) - Name of the file to delete

del_path(self, id, path, recursive=False)

source code 

Delete a subpath from an object, and can do so recursively (optional) If the path is found to be not "empty" (ie has not parts in it) and recursive is not True, then it will raise a PathIsNotEmptyException

Parameters:
  • id (identifier) - Identifier for the pairtree object to delete from
  • path (Directory path) - subdirectory path to delete
  • recursive (bool) - Whether the delete is recursive (think rm -r)

delete_object(self, id)

source code 

Delete's an object from the pairtree store, including any parts and subpaths There is no undo...

Parameters:
  • id (identifier) - Identifier of the object to delete

exists(self, id, path=None)

source code 

Answers the question "Does object or object subpath/file 'xxxxxxx' exist?"

Parameters:
  • id (identifier) - Identifier for the pairtree object to look for
  • path (Directory path) - Subpath or subfilepath to check
Returns:
bool

_get_new_id(self)

source code 

Inbuilt method to randomly generate an id, if one is not given to either get_object or create_object.

Simply returns a random 14 digit long (base 10) number, not fantastically useful but at least makes sure it is unique in the store.

Returns:
Random but unique 14-digit long id number

get_object(self, id=None, create_if_doesnt_exist=True)

source code 

Returns an pairtree object with identifier id if it exists.

If the object at id doesn't exist then depending on create_if_doesnt_exist,

>>> bar = client.get_object('bar')
# the object with id 'bar' will be retrieved and created if necessary.

Setting this flag to False, will cause it to raise an exception if it cannot find an object.

>>> fake = client.get_object('doesnotexist', create_if_doesnt_exist=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.linux-i686/egg/pairtree/pairtree_client.py", line 231, in get_object
pairtree.storage_exceptions.ObjectNotFoundException

(note that fake = client.get_object('doesnotexist', False) is equivalent to the above line)

Parameters:
  • id (identifier) - Identifier for the pairtree object to get (or create)
  • create_if_doesnt_exist (True|False) - Flag - if True, an object will be created if it doesn't yet exist. Will raise an ObjectNotFoundException if set to False and the object is non-existent.
Returns:
PairtreeStorageObject

create_object(self, id)

source code 

Creates a new object with identifier id

>>> bar = client.create_object('bar')
>>>

Note that reissuing that command again will raise an ObjectAlreadyExistsException:

>>> bar = client.create_object('bar')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.linux-i686/egg/pairtree/pairtree_client.py", line 235, in create_object
pairtree.storage_exceptions.ObjectAlreadyExistsException
Parameters:
  • id (identifier) - Identifier for the pairtree object to create
Returns:
PairtreeStorageObject