Welcome to OFS File Storage (OFS) Documentation

OFS is a bucket/object storage library.

It provides a common API for storing bitstreams (plus related metadata) in ‘bucket/object’ stores such as:

  • S3-like: S3, Google Storage, Eucalytus, Archive.org
  • Filesystem (via pairtree and other methods)
  • ‘REST’ Store (see remote/reststore.py - implementation at http://bitbucket.org/pudo/repod/)
  • add a backend here - just implement the methods in base.py

Why use the library:

  • Abstraction: write common code but use different storage backends
  • More than a filesystem, less than a database - support for metadata as well bitstreams

OFS Interface

Interface that must be implemented by all OFS backends.

class ofs.base.OFSInterface

Abstract specification of OFS interface. Implementing backends must implement at least this interface.

Metadata

Metadata keys must be ascii and alphanumeric plus ‘_’ and ‘-‘.

Standard metadata: This metadata will always be available from get_metadata. Attempts to delete these keys will fail.

  • _creation_date
  • _last_modified
  • _content_length
  • _checksum –> “{type}:{number}” eg “md5:767f7a...”
  • _owner
  • _format (content-type)
  • _bucket
  • _label
claim_bucket(bucket)

Claim a bucket.

Returns:True if successful, False otherwise.
del_metadata_keys(bucket, label, keys)

Delete the metadata corresponding to the specified keys.

del_stream(bucket, label)

Delete a bitstream.

exists(bucket, label)

Whether a given bucket:label object already exists.

Returns:bool.
get_metadata(bucket, label)

Get the metadata for this bucket:label identifier.

get_stream(bucket, label, as_stream=True)

Get a bitstream for the given bucket:label combination.

Parameters:bucket – the bucket to use.
Returns:bitstream as a file-like object
get_url(bucket, label)

Get a URL that should point at the bucket:labelled resource. Aimed to aid web apps by allowing them to redirect to an open resource, rather than proxy the bitstream.

Parameters:
  • bucket – the bucket to use.
  • label – the label of the resource to get
Returns:

a string URL - NB ‘file:///...’ is a resource on the locally mounted systems.

list_buckets()

List all buckets managed by this OFS instance.

Returns:iterator for the buckets.
list_labels(bucket)

List labels for the given bucket.

Parameters:bucket – bucket to list labels for.
Returns:iterator for the labels in the specified bucket.
put_stream(bucket, label, stream_object, params={})

Put a bitstream (stream_object) for the specified bucket:label identifier.

Parameters:
  • bucket – as standard
  • label – as standard
  • stream_object – file-like object to read from.
  • params – update metadata with these params (see update_metadata)
update_metadata(bucket, label, params)

Update the metadata with the provided dictionary of params.

Parameters:parmams – dictionary of key values (json serializable).

Backends

Pairtree Backend: Local Filesystem based using Pairtree

class ofs.local.pairtreestore.PTOFS(storage_dir='data', uri_base='urn:uuid:', hashing_type='md5', shorty_length=2)

OFS backend backed onto the filesystem and using PairTree.

LocalFile Store: Ultra-Simple Local File System

Warning

Not yet implemented.

class ofs.local.filestore.LocalFileOFS(storage_dir='ofsdata')

The simplest possible store you could imagine.

WARNING: not yet implemented (help wanted!).

Metadata Store: Local File System with Metadata Focus

class ofs.local.metadatastore.MDOFS(storage_dir='metadata', uri_base='urn:uuid:', hashing_type='md5', shorty_length=2, tail_retention=3, _fsep='-, -')

Implementation of a local OFS style store, which has a focus to hold small numbers of files for very large numbers of objects. Created as a response to a need to store records for 3+ million objects, without hitting hard filesystem limits.

Uses pairtree storage, but a pairtree id only comprises part of a bucket id.

Metadata

Metadata keys must be ascii and alphanumeric plus ‘_’ and ‘-‘.

Standard metadata: This metadata will always be available from get_metadata. Attempts to delete these keys will fail.

  • _creation_date
  • _last_modified
  • _content_length
  • _checksum –> “{type}:{number}” eg “md5:767f7a...”
  • _owner
  • _format (content-type)
  • _bucket
  • _label

ZipStore: OFS Storage Backed onto Zipfile

class ofs.local.zipstore.ZOFS(zipfile, mode='r', compression=0, allowZip64=False, hashing_type='md5', quiet=False)

Implementation of an OFS interface to a zip file archive.

Metadata: This is stored in the metadata/ ‘folder’ - same filename as the original bucket it describes.

claim_bucket(bucket=None)

Claim a bucket. – This is a NOOP as the bucket is a virtual folder in the zipfile and does not exist without files it ‘contains’.

Called without a ‘bucket’ it will respond with a uuid.

del_metadata_keys(bucket, label, keys)

Delete the metadata corresponding to the specified keys.

del_stream(bucket, label)

Delete a bitstream. This needs more testing - file deletion in a zipfile is problematic. Alternate method is to create second zipfile without the files in question, which is not a nice method for large zip archives.

exists(bucket, label)

Whether a given bucket:label object already exists.

get_metadata(bucket, label)

Get the metadata for this bucket:label identifier.

get_stream(bucket, label, as_stream=True)

Get a bitstream for the given bucket:label combination.

Parameters:bucket – the bucket to use.
Returns:bitstream as a file-like object
get_url(bucket, label)

Get a URL that should point at the bucket:labelled resource. Aimed to aid web apps by allowing them to redirect to an open resource, rather than proxy the bitstream.

Parameters:
  • bucket – the bucket to use.
  • label – the label of the resource to get
Returns:

a string URI - eg ‘zip:file:///home/.../foo.zip!/bucket/label’

list_buckets()

List all buckets managed by this OFS instance. Like list_labels, this also walks the entire archive, yielding the bucketnames. A local set is retained so that duplicates aren’t returned so this will temporarily pull the entire list into memory even though this is a generator and will slow as more buckets are added to the set.

Returns:iterator for the buckets.
list_labels(bucket)

List labels for the given bucket. Due to zipfiles inherent arbitrary ordering, this is an expensive operation, as it walks the entire archive searching for individual ‘buckets’

Parameters:bucket – bucket to list labels for.
Returns:iterator for the labels in the specified bucket.
put_stream(bucket, label, stream_object, params={}, replace=True, add_md=True)

Put a bitstream (stream_object) for the specified bucket:label identifier.

Parameters:
  • bucket – as standard
  • label – as standard
  • stream_object – file-like object to read from or bytestring.
  • params – update metadata with these params (see update_metadata)
update_metadata(bucket, label, params)

Update the metadata with the provided dictionary of params.

Parameters:parmams – dictionary of key values (json serializable).

S3

class ofs.remote.botostore.S3OFS(aws_access_key_id=None, aws_secret_access_key=None, **kwargs)

Google Storage

class ofs.remote.botostore.GSOFS(gs_access_key_id=None, gs_secret_access_key=None, **kwargs)

Google storage OFS backend.

Archive.org OFS

class ofs.remote.botostore.ArchiveOrgOFS(aws_access_key_id=None, aws_secret_access_key=None, **kwargs)

An archive.org backend utilizing the archive.org s3 interface (see: http://www.archive.org/help/abouts3.txt).

ProxyStore (Bounce for S3-type stores)

class ofs.remote.proxystore.S3Bounce(api_base)

Use ckanext-storage API to bounce to an S3 store

REST OFS: OFS Interface to RESTFul storage system

class ofs.remote.reststore.RESTOFS(host='http://repo.ckan.net', http_user=None, http_pass=None)

OFS interface to a RESTful storage system.

del_stream(bucket, label)

Will fail if the bucket or label don’t exist

Indices and tables