OFS is a bucket/object storage library.
It provides a common API for storing bitstreams (plus related metadata) in ‘bucket/object’ stores such as:
- S3-like: S3, Google Storage, Eucalytus, Archive.org
- Filesystem (via pairtree and other methods)
- ‘REST’ Store (see remote/reststore.py - implementation at http://bitbucket.org/pudo/repod/)
- add a backend here - just implement the methods in base.py
Why use the library:
- Abstraction: write common code but use different storage backends
- More than a filesystem, less than a database - support for metadata as well bitstreams
Interface that must be implemented by all OFS backends.
Abstract specification of OFS interface. Implementing backends must implement at least this interface.
Metadata
Metadata keys must be ascii and alphanumeric plus ‘_’ and ‘-‘.
Standard metadata: This metadata will always be available from get_metadata. Attempts to delete these keys will fail.
- _creation_date
- _last_modified
- _content_length
- _checksum –> “{type}:{number}” eg “md5:767f7a...”
- _owner
- _format (content-type)
- _bucket
- _label
Claim a bucket.
Returns: | True if successful, False otherwise. |
---|
Delete the metadata corresponding to the specified keys.
Delete a bitstream.
Whether a given bucket:label object already exists.
Returns: | bool. |
---|
Get the metadata for this bucket:label identifier.
Get a bitstream for the given bucket:label combination.
Parameters: | bucket – the bucket to use. |
---|---|
Returns: | bitstream as a file-like object |
Get a URL that should point at the bucket:labelled resource. Aimed to aid web apps by allowing them to redirect to an open resource, rather than proxy the bitstream.
Parameters: |
|
---|---|
Returns: | a string URL - NB ‘file:///...’ is a resource on the locally mounted systems. |
List all buckets managed by this OFS instance.
Returns: | iterator for the buckets. |
---|
List labels for the given bucket.
Parameters: | bucket – bucket to list labels for. |
---|---|
Returns: | iterator for the labels in the specified bucket. |
Put a bitstream (stream_object) for the specified bucket:label identifier.
Parameters: |
|
---|
Update the metadata with the provided dictionary of params.
Parameters: | parmams – dictionary of key values (json serializable). |
---|
Warning
Not yet implemented.
The simplest possible store you could imagine.
WARNING: not yet implemented (help wanted!).
Implementation of a local OFS style store, which has a focus to hold small numbers of files for very large numbers of objects. Created as a response to a need to store records for 3+ million objects, without hitting hard filesystem limits.
Uses pairtree storage, but a pairtree id only comprises part of a bucket id.
Metadata
Metadata keys must be ascii and alphanumeric plus ‘_’ and ‘-‘.
Standard metadata: This metadata will always be available from get_metadata. Attempts to delete these keys will fail.
- _creation_date
- _last_modified
- _content_length
- _checksum –> “{type}:{number}” eg “md5:767f7a...”
- _owner
- _format (content-type)
- _bucket
- _label
Implementation of an OFS interface to a zip file archive.
Metadata: This is stored in the metadata/ ‘folder’ - same filename as the original bucket it describes.
Claim a bucket. – This is a NOOP as the bucket is a virtual folder in the zipfile and does not exist without files it ‘contains’.
Called without a ‘bucket’ it will respond with a uuid.
Delete the metadata corresponding to the specified keys.
Delete a bitstream. This needs more testing - file deletion in a zipfile is problematic. Alternate method is to create second zipfile without the files in question, which is not a nice method for large zip archives.
Whether a given bucket:label object already exists.
Get the metadata for this bucket:label identifier.
Get a bitstream for the given bucket:label combination.
Parameters: | bucket – the bucket to use. |
---|---|
Returns: | bitstream as a file-like object |
Get a URL that should point at the bucket:labelled resource. Aimed to aid web apps by allowing them to redirect to an open resource, rather than proxy the bitstream.
Parameters: |
|
---|---|
Returns: | a string URI - eg ‘zip:file:///home/.../foo.zip!/bucket/label’ |
List all buckets managed by this OFS instance. Like list_labels, this also walks the entire archive, yielding the bucketnames. A local set is retained so that duplicates aren’t returned so this will temporarily pull the entire list into memory even though this is a generator and will slow as more buckets are added to the set.
Returns: | iterator for the buckets. |
---|
List labels for the given bucket. Due to zipfiles inherent arbitrary ordering, this is an expensive operation, as it walks the entire archive searching for individual ‘buckets’
Parameters: | bucket – bucket to list labels for. |
---|---|
Returns: | iterator for the labels in the specified bucket. |
Put a bitstream (stream_object) for the specified bucket:label identifier.
Parameters: |
|
---|
Update the metadata with the provided dictionary of params.
Parameters: | parmams – dictionary of key values (json serializable). |
---|
Google storage OFS backend.
An archive.org backend utilizing the archive.org s3 interface (see: http://www.archive.org/help/abouts3.txt).
Use ckanext-storage API to bounce to an S3 store