file_archive can be used either as a command-line tool, to create, change or query a file store, or as a Python library. It is intended to keep simulation results or other large number of similar objects, and is used as the back-end for VisTrails workflow management and provenance system.
A file store is simply a directory with an objects subdirectory, containing the actual files, and a database file, an SQLite3 database containing the metadata.
The metadata simply consists of key=value pairs. You give the system these pairs when you add it, and you can then do query on the entire filestore to find the files matching a given conditions. Example:
$ file_archive ../mystore add /tmp/simresults model=weather2 cluster=poly
0f72c656ac0997fcab8f6590f71c57fc1a767508
$ file_archive ../mystore query model=weather2
a77a813e049b1f05afd614fe4b8e11e59fb65b99
cluster: "poly-old"
model: "weather2"
0f72c656ac0997fcab8f6590f71c57fc1a767508
cluster: "poly"
model: "weather2"
Using it as a command-line tool is pretty easy; typing file_archive (or python file_archive if you did not install it system-wide) will give you the following quick reference:
usage: file_archive <store> create
or: file_archive <store> add <filename> [key1=value1] [...]
or: file_archive <store> query [key1=value1] [...]
or: file_archive <store> print <filehash> [...]
or: file_archive <store> print [key1=value1] [...]
or: file_archive <store> remove <filehash>
or: file_archive <store> remove <key1=value1> [...]
or: file_archive <store> verify
File FileStore class can be used to add, remove and query from a store.
Represents a file store.
Adds a file or directory with a dict of metadata.
This simply calls either add_file() or add_directory() with the given arguments.
Adds a directory given a path and dict of metadata.
The directory will be recursively copied to the store, and an entry will be added to the database.
Adds a file given a file object or path and dict of metadata.
The file will be copied/written in the store, and an entry will be added to the database.
Note that, if you pass a file object, it needs to support newfile.seek(0, os.SEEK_SET) as it will be read twice: once to compute its SHA1 hash, and a second time to write it to disk.
Gets an Entry from a hash.
Returns the file path for a given SHA1 hash.
Returns a file object for a given SHA1 hash.
Returns all the Entries matching the conditions.
An EntryIterator is returned, with which you can access the different results.
Returns at most one Entry matching the conditions.
Returns one of the Entry object matching the conditions or None.
Removes a file or directory given its SHA1 hash.
It is deleted from the store and removed from the database.
Checks the integrity of the store.