Package pairtree :: Module pairtree_path
[hide private]
[frames] | no frames]

Module pairtree_path

source code

Conventions used:

From http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html version 0.1

This client handles all of the pairtree conventions, and provides a Pairtree object to make it easier to interact with.

Usage

>>> from pairtree import PairtreeStorageClient

To create a pairtree store in mystore/ to hold objects which have a URI base of http://example.org/ark:/123

>>> store = PairtreeStorageClient(store_dir='mystore', uri_base='http://example.org/ark:/123')
Functions [hide private]
 
char2hex(m) source code
 
hex2char(m) source code
 
id_encode(id)
The identifier string is cleaned of characters that are expected to occur rarely in object identifiers but that would cause certain known problems for file systems.
source code
 
id_decode(id)
This decodes a given identifier from its pairtree filesystem encoding, into its original form:
source code
 
get_id_from_dirpath(dirpath, pairtree_root='')
Internal - method for discovering the pairtree identifier for a given directory path.
source code
 
get_path_from_dirpath(dirpath, pairtree_root='')
Internal - walks a directory chain and builds a list of the directory shorties relative to the pairtree_root
source code
 
id_to_dirpath(id, pairtree_root='', shorty_length=2)
Internal - method for turning an identifier into a pairtree directory tree of shorties.
source code
 
id_to_dir_list(id, pairtree_root='', shorty_length=2)
Internal - method for turning an identifier into a list of pairtree directory tree of shorties.
source code
Variables [hide private]
  logger = logging.getLogger('pairtreepath')
  encode_regex = re.compile(r'(?u)["\*\+,<=>\?\\\^\|]|[^!-~]')
  decode_regex = re.compile(r'(?u)\^(..)')
  __package__ = 'pairtree'
Function Details [hide private]

id_encode(id)

source code 

The identifier string is cleaned of characters that are expected to occur rarely in object identifiers but that would cause certain known problems for file systems. In this step, every UTF-8 octet outside the range of visible ASCII (94 characters with hexadecimal codes 21-7e) [ASCII] (Cerf, “ASCII format for network interchange,” October 1969.), as well as the following visible ASCII characters:

   "   hex 22           <   hex 3c           ?   hex 3f
   *   hex 2a           =   hex 3d           ^   hex 5e
   +   hex 2b           >   hex 3e           |   hex 7c
   ,   hex 2c

must be converted to their corresponding 3-character hexadecimal encoding, ^hh, where ^ is a circumflex and hh is two hex digits. For example, ' ' (space) is converted to ^20 and '*' to ^2a.

In the second step, the following single-character to single-character conversions must be done:

       / -> =
       : -> +
       . -> ,

These are characters that occur quite commonly in opaque identifiers but present special problems for filesystems. This step avoids requiring them to be hex encoded (hence expanded to three characters), which keeps the typical ppath reasonably short. Here are examples of identifier strings after cleaning and after ppath mapping:

   id:  ark:/13030/xt12t3
       ->  ark+=13030=xt12t3
       ->  ar/k+/=1/30/30/=x/t1/2t/3/
   id:  http://n2t.info/urn:nbn:se:kb:repos-1
       ->  http+==n2t,info=urn+nbn+se+kb+repos-1
       ->  ht/tp/+=/=n/2t/,i/nf/o=/ur/n+/n/bn/+s/e+/kb/+/re/p/os/-1/
   id:  what-the-*@?#!^!?
       ->  what-the-^2a@^3f#!^5e!^3f
       ->  wh/at/-t/he/-^/2a/@^/3f/#!/^5/e!/^3/f/

(From section 3 of the Pairtree specification)

Parameters:
  • id (identifier) - Encode the given identifier according to the pairtree 0.1 specification
Returns:
A string of the encoded identifier

id_decode(id)

source code 

This decodes a given identifier from its pairtree filesystem encoding, into its original form:

Parameters:
  • id (identifier) - Identifier to decode
Returns:
A string of the decoded identifier

get_id_from_dirpath(dirpath, pairtree_root='')

source code 

Internal - method for discovering the pairtree identifier for a given directory path.

E.g. pairtree_root/fo/ob/ar/+/ --> 'foobar:'

Parameters:
  • dirpath (Path to object's root) - Directory path to decode
Returns:
Decoded identifier

get_path_from_dirpath(dirpath, pairtree_root='')

source code 

Internal - walks a directory chain and builds a list of the directory shorties relative to the pairtree_root

Parameters:
  • dirpath (Directory path) - Directory path to walk

id_to_dirpath(id, pairtree_root='', shorty_length=2)

source code 

Internal - method for turning an identifier into a pairtree directory tree of shorties.

  • "foobar://ark.1" --> "fo/ob/ar/+=/ar/k,/1"
Parameters:
  • id (identifier) - Identifer for a pairtree object
Returns:
A directory path to the object's root directory

id_to_dir_list(id, pairtree_root='', shorty_length=2)

source code 

Internal - method for turning an identifier into a list of pairtree directory tree of shorties.

  • "foobar://ark.1" --> ["fo","ob","ar","+=","ar","k,","1"]
Parameters:
  • id (identifier) - Identifer for a pairtree object
Returns:
A list of directory path fragments to the object's root directory