file_archive: A file store with searchable metadata

About

file_archive can be used either as a command-line tool, to create, change or query a file store, or as a Python library. It is intended to keep simulation results or other large number of similar objects, and is used as the back-end for VisTrails workflow management and provenance system.

A file store is simply a directory with an objects subdirectory, containing the actual files, and a database file, an SQLite3 database containing the metadata.

The metadata simply consists of key=value pairs. You give the system these pairs when you add it, and you can then do query on the entire filestore to find the files matching a given conditions. Example:

$ file_archive ../mystore add /tmp/simresults model=weather2 cluster=poly
0f72c656ac0997fcab8f6590f71c57fc1a767508
$ file_archive ../mystore query model=weather2
a77a813e049b1f05afd614fe4b8e11e59fb65b99
        cluster: "poly-old"
        model: "weather2"
0f72c656ac0997fcab8f6590f71c57fc1a767508
        cluster: "poly"
        model: "weather2"

Command-line usage

Using it as a command-line tool is pretty easy; typing file_archive (or python file_archive if you did not install it system-wide) will give you the following quick reference:

usage: file_archive <store> create
   or: file_archive <store> add <filename> [key1=value1] [...]
   or: file_archive <store> query [key1=value1] [...]
   or: file_archive <store> print <filehash> [...]
   or: file_archive <store> print [key1=value1] [...]
   or: file_archive <store> remove <filehash>
   or: file_archive <store> remove <key1=value1> [...]
   or: file_archive <store> verify

Using file_archive as a library

File FileStore class can be used to add, remove and query from a store.

class file_archive.FileStore(path)

Represents a file store.

add(newpath, metadata)

Adds a file or directory with a dict of metadata.

This simply calls either add_file() or add_directory() with the given arguments.

add_directory(newdir, metadata)

Adds a directory given a path and dict of metadata.

The directory will be recursively copied to the store, and an entry will be added to the database.

add_file(newfile, metadata)

Adds a file given a file object or path and dict of metadata.

The file will be copied/written in the store, and an entry will be added to the database.

Note that, if you pass a file object, it needs to support newfile.seek(0, os.SEEK_SET) as it will be read twice: once to compute its SHA1 hash, and a second time to write it to disk.

get(objecthash)

Gets an Entry from a hash.

get_filename(filehash, make_dir=False)

Returns the file path for a given SHA1 hash.

open_file(filehash, binary=True)

Returns a file object for a given SHA1 hash.

query(conditions, limit=None)

Returns all the Entries matching the conditions.

An EntryIterator is returned, with which you can access the different results.

query_one(conditions)

Returns at most one Entry matching the conditions.

Returns one of the Entry object matching the conditions or None.

remove(objecthash)

Removes a file or directory given its SHA1 hash.

It is deleted from the store and removed from the database.

verify()

Checks the integrity of the store.

Table Of Contents

This Page