1
2
3
4
5 """
6 FS Pairtree storage
7 ===================
8
9 Conventions used:
10
11 From http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html version 0.1
12
13 NOTICE
14 ======
15
16 The Pairtree specification on which this implementation is based is (c) 2009 UC Regents.
17
18 Various regexes used in path to id conversion and the bulk of the unittests were
19 contributed by Erik Hetzner, based on John Kunze's work, also (c) 2009 UC Regents
20 and released under the Apache license.
21
22 The ppath script
23 ================
24
25 A ppath script is included for convenience to be used in shell scripts or similar. Eg:
26
27 C{ppath topath} examples::
28
29 $ vim mystore/pairtree_root/`ppath topath document:105/data/doc.txt`
30 (Opens the file at mystore/pairtree_root/do/cu/me/nt/+1/05/data/doc.txt)
31 $ cp `ppath topath foo:bar/1.txt` `ppath topath bar:foo/2.txt`
32
33 C{ppath toid} examples::
34
35 data/subjects/pairtree_root/HA/SS/ET/ROOT$ ppath toid `pwd`
36 HASSET/ROOT
37
38 Quick Start:
39 ============
40
41 >>> from pairtree import *
42
43 >>> # Get the store 'factory'
44 >>> f = PairtreeStorageFactory()
45
46 The factory object is solely there to create clients for individual pairtree
47 stores. For example:
48
49 >>> store_foo = f.get_store(store_dir="data", uri_base="http://")
50
51 This will create the following on disc in a directory called 'data' if it doesn't already exist::
52
53 $ ls -R data/
54 data/:
55 pairtree_prefix pairtree_root pairtree_version0_1
56
57 data/pairtree_root:
58
59 Where
60 1. the file 'pairtree_prefix' contains just "http://"
61 2. the file 'pairtree_version0_1' contains::
62
63 This directory conforms to Pairtree Version 0.1.
64 Updated spec: http://www.cdlib.org/inside/diglib/pairtree/pairtreespec.html
65
66 Note, if data *had* already existed and was a pairtree store, the uri_base would
67 have been read from the prefix file and override the one supplied above.
68
69 Also, if you try to create a store over a directory that already exists, but which isn't
70 a pairtree store that it can recognise, it will raise a NotAPairtreeStoreException.
71
72 Valid store names fit the regex ^[A-z][A-z0-9]* - but this is an arbitrary limitation
73 and can be removed if it is seen as unnecessary.
74
75 Creating and Getting store object:
76 ==================================
77
78 Two main commands for this activity, eg continuing on:
79
80 >>> bar = store_foo.create_object('bar')
81 >>>
82
83 Note that reissuing that command again will raise an Exception:
84
85 >>> bar = store_foo.create_object('bar')
86 Traceback (most recent call last):
87 File "<stdin>", line 1, in <module>
88 File "build/bdist.linux-i686/egg/pairtree/pairtree_client.py", line 235, in create_object
89 pairtree.storage_exceptions.ObjectAlreadyExistsException
90
91 There is also a 'get_object' command, which is more accommodating, as it can be passed
92 a fairly self-explanatory flag, which by default will create the object if it doesn't exist:
93
94 I{get_object(self, id=None, create_if_doesnt_exist=True)}
95
96 >>> bar = store_foo.get_object('bar')
97
98 Setting this flag to False, will cause it to raise an exception if it cannot find an object.
99
100 >>> fake = store_foo.get_object('doesnotexist', create_if_doesnt_exist=False)
101 Traceback (most recent call last):
102 File "<stdin>", line 1, in <module>
103 File "build/bdist.linux-i686/egg/pairtree/pairtree_client.py", line 231, in get_object
104 pairtree.storage_exceptions.ObjectNotFoundException
105
106 (note that fake = store_foo.get_object('doesnotexist', False) is equivalent to the above line)
107
108 A pairtree object:
109 ==================
110
111 The important methods:
112
113 - add_bytestream(filename, bytestream, path=None, buffer_size=None):
114 -. get_bytestream(filename, streamable=False, path=None):
115 - del_file(filename, path=None):
116 - list_parts(path=None):
117
118 E.g. - Examples speak louder than words
119
120 >>> bar.add_bytestream('foo.txt', 'can be any sequence of bytes')
121 >>> bar.list_parts()
122 ['foo.txt']
123 >>>
124
125 Adding buffered content from a file:
126
127 >>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream:
128 ... bar.add_bytestream('Firefox_wallpaper.png', stream)
129 ...
130 >>>
131
132 Adding the same file to magic/path/inside/object - paths are automatically created on
133 demand.
134
135 >>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream:
136 ... bar.add_bytestream('Firefox_wallpaper.png', stream, path='magic/path/inside/object')
137 ...
138 >>>
139
140 Removing the first copy of that file, which was added to the wrong place:
141
142 >>> bar.del_file('Firefox_wallpaper.png')
143 >>> bar.list_parts()
144 ['magic', 'foo.txt']
145 >>> bar.list_parts('magic/path')
146 ['inside']
147 >>> bar.list_parts('magic/path/inside/object')
148 ['Firefox_wallpaper.png']
149 >>>
150
151 There are also some convenience methods:
152
153 - add_bytestream_by_path(self, filepath, bytestream, buffer_size=None):
154 - del_file_by_path(self, filepath, bytestream):
155 - get_bytestream_by_path(self, filepath, streamable=False):
156
157 The I{by_path} suffix means that you can give it the whole path as one, and it will
158 try to figure out what is intended, for example, consider the png we placed in a nested
159 directory earlier:
160
161 >>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream:
162 ... bar.add_bytestream('Firefox_wallpaper.png', stream, path='magic/path/inside/object')
163 ...
164
165 This can be written as:
166
167 >>> with open('/home/ben/Firefox_wallpaper.png','rb') as stream:
168 ... bar.add_bytestream_by_path('magic/path/inside/object/Firefox_wallpaper.png', stream)
169 ...
170
171 Getting files from an object
172 ============================
173
174 The flag I{streamable} is key here - if this is set to True, then you will be passed
175 a file handle, which you must remember to close or use the construct:
176
177 >>> with bar.get_bytestream('foo.txt', streamable=True) as text:
178 ... print text.read()
179 ...
180 >>>
181
182 This is very useful for large files you wish to scan through, but do not need to hold
183 in memory all at the same time.
184
185 By setting streamable to False, the entire file is read into memory and returned:
186
187 >>> print bar.get_bytestream('foo.txt')
188 can be any sequence of bytes
189 """
190
191 __version__ = '0.5.2'
192
193 from pairtree_client import *
194 from pairtree_store import *
195 from pairtree_object import *
196 import pairtree_path as ppath
197 from pairtree_path import id_encode, id_decode
198 from storage_exceptions import *
199
201 """
202 pass in a pairtree id and get back a path
203 """
204 path = ppath.id_to_dirpath(id)
205 return path
206
208 """
209 pass in a pairtree path and get back an id
210 """
211 return ppath.get_id_from_dirpath(path)
212