Package pairtree
Source Code for Package pairtree

  1  #!/usr/bin/python 
  2  # -*- coding: utf-8 -*- 
  3   
  4   
  5  """ 
  6  FS Pairtree storage 
  7  =================== 
  8   
  9  Conventions used: 
 10   
 11  From http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html version 0.1 
 12   
 13  NOTICE 
 14  ====== 
 15   
 16  The Pairtree specification on which this implementation is based is (c) 2009 UC Regents. 
 17   
 18  Various regexes used in path to id conversion and the bulk of the unittests were  
 19  contributed by Erik Hetzner, based on John Kunze's work, also (c) 2009 UC Regents 
 20  and released under the Apache license. 
 21   
 22  The ppath script 
 23  ================ 
 24   
 25  A ppath script is included for convenience to be used in shell scripts or similar. Eg: 
 26   
 27  C{ppath topath} examples:: 
 28   
 29      $ vim mystore/pairtree_root/`ppath topath document:105/data/doc.txt` 
 30      (Opens the file at mystore/pairtree_root/do/cu/me/nt/+1/05/data/doc.txt) 
 31      $ cp `ppath topath foo:bar/1.txt` `ppath topath bar:foo/2.txt` 
 32   
 33  C{ppath toid} examples:: 
 34   
 35      data/subjects/pairtree_root/HA/SS/ET/ROOT$ ppath toid `pwd` 
 36      HASSET/ROOT 
 37       
 38  Quick Start: 
 39  ============ 
 40   
 41  >>> from pairtree import * 
 42   
 43  >>> # Get the store 'factory' 
 44  >>> f = PairtreeStorageFactory() 
 45   
 46  The factory object is solely there to create clients for individual pairtree 
 47  stores. For example: 
 48   
 49  >>> store_foo = f.get_store(store_dir="data", uri_base="http://") 
 50   
 51  This will create the following on disc in a directory called 'data' if it doesn't already exist:: 
 52   
 53      $ ls -R data/ 
 54      data/: 
 55      pairtree_prefix  pairtree_root  pairtree_version0_1 
 56   
 57      data/pairtree_root: 
 58   
 59  Where  
 60      1. the file 'pairtree_prefix' contains just "http://" 
 61      2. the file 'pairtree_version0_1' contains:: 
 62       
 63         This directory conforms to Pairtree Version 0.1. 
 64         Updated spec: http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html 
 65   
 66  Note, if data *had* already existed and was a pairtree store, the uri_base would 
 67  have been read from the prefix file and override the one supplied above. 
 68   
 69  Also, if you try to create a store over a directory that already exists, but which isn't 
 70  a pairtree store that it can recognise, it will raise a NotAPairtreeStoreException. 
 71   
 72  Valid store names fit the regex ^[A-z][A-z0-9]* - but this is an arbitrary limitation 
 73  and can be removed if it is seen as unnecessary. 
 74   
 75  Creating and Getting store object: 
 76  ================================== 
 77   
 78  Two main commands for this activity, eg continuing on: 
 79   
 80  >>> bar = store_foo.create_object('bar') 
 81  >>> 
 82   
 83  Note that reissuing that command again will raise an Exception: 
 84   
 85  >>> bar = store_foo.create_object('bar') 
 86  Traceback (most recent call last): 
 87    File "<stdin>", line 1, in <module> 
 88    File "build/bdist.linux-i686/egg/pairtree/pairtree_client.py", line 235, in create_object 
 89  pairtree.storage_exceptions.ObjectAlreadyExistsException 
 90   
 91  There is also a 'get_object' command, which is more accommodating, as it can be passed 
 92  a fairly self-explanatory flag, which by default will create the object if it doesn't exist: 
 93   
 94  I{get_object(self, id=None, create_if_doesnt_exist=True)} 
 95   
 96  >>> bar = store_foo.get_object('bar') 
 97   
 98  Setting this flag to False, will cause it to raise an exception if it cannot find an object. 
 99   
100  >>> fake = store_foo.get_object('doesnotexist', create_if_doesnt_exist=False) 
101  Traceback (most recent call last): 
102    File "<stdin>", line 1, in <module> 
103    File "build/bdist.linux-i686/egg/pairtree/pairtree_client.py", line 231, in get_object 
104  pairtree.storage_exceptions.ObjectNotFoundException 
105   
106  (note that fake = store_foo.get_object('doesnotexist', False) is equivalent to the above line) 
107   
108  A pairtree object: 
109  ================== 
110   
111  The important methods: 
112   
113      -  add_bytestream(filename, bytestream, path=None, buffer_size=None): 
114      -. get_bytestream(filename, streamable=False, path=None): 
115      -  del_file(filename, path=None): 
116      -  list_parts(path=None): 
117   
118  E.g. - Examples speak louder than words 
119   
120  >>> bar.add_bytestream('foo.txt', 'can be any sequence of bytes') 
121  >>> bar.list_parts() 
122  ['foo.txt'] 
123  >>>  
124   
125  Adding buffered content from a file: 
126   
127  >>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream: 
128  ...   bar.add_bytestream('Firefox_wallpaper.png', stream) 
129  ...  
130  >>>  
131   
132  Adding the same file to magic/path/inside/object - paths are automatically created on 
133  demand. 
134   
135  >>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream: 
136  ...   bar.add_bytestream('Firefox_wallpaper.png', stream, path='magic/path/inside/object') 
137  ...  
138  >>>  
139   
140  Removing the first copy of that file, which was added to the wrong place: 
141   
142  >>> bar.del_file('Firefox_wallpaper.png') 
143  >>> bar.list_parts() 
144  ['magic', 'foo.txt'] 
145  >>> bar.list_parts('magic/path') 
146  ['inside'] 
147  >>> bar.list_parts('magic/path/inside/object') 
148  ['Firefox_wallpaper.png'] 
149  >>>  
150   
151  There are also some convenience methods: 
152   
153      -  add_bytestream_by_path(self, filepath, bytestream, buffer_size=None): 
154      -  del_file_by_path(self, filepath, bytestream): 
155      -  get_bytestream_by_path(self, filepath, streamable=False): 
156   
157  The I{by_path} suffix means that you can give it the whole path as one, and it will 
158  try to figure out what is intended, for example, consider the png we placed in a nested 
159  directory earlier: 
160   
161  >>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream: 
162  ...   bar.add_bytestream('Firefox_wallpaper.png', stream, path='magic/path/inside/object') 
163  ...  
164   
165  This can be written as: 
166   
167  >>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream: 
168  ...   bar.add_bytestream_by_path('magic/path/inside/object/Firefox_wallpaper.png', stream) 
169  ...  
170   
171  Getting files from an object 
172  ============================ 
173   
174  The flag I{streamable} is key here - if this is set to True, then you will be passed 
175  a file handle, which you must remember to close or use the construct: 
176   
177  >>> with bar.get_bytestream('foo.txt', streamable=True) as text: 
178  ...   print text.read() 
179  ...  
180  >>> 
181   
182  This is very useful for large files you wish to scan through, but do not need to hold 
183  in memory all at the same time. 
184   
185  By setting streamable to False, the entire file is read into memory and returned: 
186   
187  >>> print bar.get_bytestream('foo.txt') 
188  can be any sequence of bytes 
189  """ 
190   
191  __version__ = '0.5.2' 
192   
193  from pairtree_client import * 
194  from pairtree_store import * 
195  from pairtree_object import * 
196  import pairtree_path as ppath 
197  from pairtree_path import id_encode, id_decode 
198  from storage_exceptions import * 
199   
200 -def id2path(id): 
201      """ 
202      pass in a pairtree id and get back a path 
203      """ 
204      path = ppath.id_to_dirpath(id) 
205      return path 
206   
207 -def path2id(path): 
208      """ 
209      pass in a pairtree path and get back an id 
210      """ 
211      return ppath.get_id_from_dirpath(path) 
212