Development Guide¶
The best way to create a new database for Bob is by first looking at
examples of working databases that are similar to the dataset you want to
create an API for. The name of database interface packages for Bob
usually start with bob.db.
. Take a look at our available packages list
to find available Bob db packages.
In this development guide, we provide a list of guidelines on how to structure your database and solve common development issues. We’ll use the example on the Using Bob Databases of the Samples database (see: Example Database) inside this guide.
Warning
This development guide is not a replacement for our official Bob Python/C++ Extensions guide, which explains how to structure packages. It only contains additional guidelines you should consider when developing db packages.
Naming Convention¶
All database packages should be named bob.db.<dataset-slug>
where
<dataset-slug>
corresponds to a short name for the dataset you’re trying to
bind. As a general recommendation, use small and catchy names. Avoid long names
which are confusing and difficult to type and remember.
Examples of good, catchy names:
atnt
: for the AT&T faces database (seebob.db.atnt
)iris
: for Fisher’s Iris flower dataset (seebob.db.iris
)mobio
: for the Mobio biometric database (seebob.db.mobio
)
Examples of more inappropriate names (avoid these):
database_cut1_whatnot
(avoid very long package names)test-mine
(avoid dashes - they don’t play well with python)set_v2
(avoid version names)
Module Organization¶
Most people find it beneficial to share a common module organization so that it is easy for everyone to maintain different databases. This is our typical directory organization, which you should try to adhere to:
bob.db.<dataset-slug>
+-- bob
+-- __init__.py #namespace init for "bob"
+-- db
+-- __init__.py #namespace init for "bob.db"
+-- <dataset-slug>
+-- __init__.py
+-- models.py
+-- query.py
+-- driver.py
+-- test.py
The file models.py
include all classes and functions that represent the
task described by the database. If, for example, the database is composed of
image samples and associated tags, possible classes in models.py
may
include Sample
and Tag
. See an example at
bob.db.base.tests.sample.Sample
.
The file query.py
normally only contains the definition of the Database
class. Finally, the file driver.py
contains the connector allowing the
command-line program bob_dbmanage.py
to interface with your python API for
specific tasks.
It is conventional that all elements inside models.py
and query.py
are
imported into the package’s __init__.py
to avoid obliging the user to understand
implementation specificities.
The driver.py
should declare a database with a name matching
<dataset-slug>
and implement at least the following methods from
bob.db.base.driver.Interface
:
bob.db.base.driver.Interface.name()
: Here, insert the<dataset-slug>
. See an example in the source code forbob.db.base.tests.sample.driver.Interface.name()
bob.db.base.driver.Interface.version()
: Here, insert the db package version. This is typically done by usingpkg_resources
. See an example in the source code forbob.db.base.tests.sample.driver.Interface.version()
bob.db.base.driver.Interface.files()
: Here, insert the metafiles this package contains (see Metafiles Not Shipped with the Database)bob.db.base.driver.Interface.type()
: Returns the type of the backend implementation. The return value of this function on your driver implementation will allowbob_dbmanage.py
to provide specialized actions for SQL-backend db package implementations. The value returned should be eitherbuiltin
orsqlite
The file test.py
should contain basic test units for all functionality
shipped with the database. This should contain, at least:
- Tests for reading out samples,
- Tests for sub-selection of samples using parameters of the
objects()
method.
Documentation should accompany the package and indicate how to use the db package Python API and its command-line interface, with examples and appropriate doctests.
File-based Databases¶
If the raw dataset in question is composed of files, it may be beneficial to re-use (through inheritance) some classes available in this package. Read the documentation of:
In this case, make sure your Sample
objects inherit from File
, as is,
for example, the case for bob.db.base.tests.sample.Sample
.
Equivalently, you may also benefit from some database constructions if your
Database
inherits from bob.db.base.Database
.
SQL-Backend for File Databases¶
So far, we have exemplified the implementation and organization of a simple db package, for which the dataset contained only a few raw image samples and a single evaluation protocol. For very complex problems, in which datasets contain many hundreds of raw samples and multiple evaluation protocols, a more complex modelling of the internals of the Bob db package may be required. In such cases, we recommend prospective developers to consider using alternative techniques (as opposed to simple file lists) for implementing the internals of their db packages. In this guide, we introduce how to handle database backends using SQLite through SQLAlchemy, for which support is built into this package.
It is important to note that using a simple or complex backend implementation for storing and retrieving iterables from the database must be completely transparent to the db package user. Users of your Bob db package are primarily interested in iterating over (sub-selected) samples and executing their pipelines. The choice of the backend to use for a given db package must be, therefore, totally opaque to them.
Warning
Understanding SQL, database structuring, normalization and how to deploy an ORM (such as SQLAlchemy) is beyond the scope of this guide. Search for guides and tutorials on the net to familiarize yourself with these subjects before trying to make a backend based with SQLAlchemy.
Module Organization for SQLite¶
The module organization of typical SQL-backend db packages is very similar to simple file-based ones:
bob.db.<dataset-slug>
+-- bob
+-- __init__.py #namespace init for "bob"
+-- db
+-- __init__.py #namespace init for "bob.db"
+-- <dataset-slug>
+-- __init__.py
+-- models.py
+-- query.py
+-- create.py
+-- driver.py
+-- db.sql3
+-- test.py
The file models.py
will contain the definition of the SQL tables for every
component in the database. One possible table in the SQL database will be that
of Sample
‘s. If you’re designing a db package for a dataset with a
one-file-per-sample storage model, ensure your Sample
class also inherits
from bob.db.base.File
to provide a uniform experience to users
already used to bob.db interfaces. Other tables and relationships are optional
and should map your problem alongside its contraints to a proper database
schema.
The file query.py
will contain the definition of the Database
class. We
recommend you consider, in this case, inheriting from
bob.db.base.SQLiteDatabase
, which provides a number of utilities to
handle file-based datasets with an SQLite backend. You’ll find examples among
different Bob db packages for this. Typically, elements returned by the
Database
class objects()
in this case are ORM objects from SQLAlchemy,
representing a row in a table of your internal database. However, these objects
behave exactly the same as non-SQL Sample
objects and allow the user to
transparently load file contents and meta data using simple API calls.
The file driver.py
will be very similar to other databases, with a few
exceptions:
- The
bob.db.base.driver.Interface.files()
implementation will return at least the path to thedb.sql3
file, which will contain the db package backend information. This will allowbob_dbmanage.py
to download this file in installations it misses or upload updated versions of it to our central server. - SQL-backend db packages normally install a
create
command at the driverbob.db.base.driver.Interface
allowing developers to create thedb.sql3
file from scratch. It is important to have a create command so that the database can be re-created in case of changes.
The file create.py
typically contains routines for the creation of
db.sql3
from scratch and is not required per se. Functions and modules
implemented in create.py
are imported into driver.py
for the
instantiation of the create
command. See examples in db packages for
mobio or verafinger.
Finally, the file test.py
should contain the usual set of tests, as for
simpler databases.
The package documentation should contain all information regarding the Python API, command-line interface and, if possible, the SQL-backend database design (showing tables, column types and constraints where adequate).
Metafiles Not Shipped with the Database¶
Very often, Bob db packages require the use of support files which should exist inside the package structure, but are not kept under version control. Reasons for this may be that these meta-files are too large or can be recreated programmatically. Such files may be of different natures and each developer should be able to recognize those easily when the situation occurs. Here is a non-exhaustive list of possible use cases for such metafiles:
- Annotations
- File lists (for example, defining evaluation protocols or such)
- Auxiliary database files (for example, Sqlite database files)
- Samples (in case you want to ship them with your database)
In order to mitigate issues related to management, this package provides a set
of utilies to handle such meta-files more easily. To prepare your package for
handling metafiles, you must first make sure that the driver.py
Interface
class returns a non-empty list as a result of the files()
method.
Each entry in the list returned by files()
should represent the full
path of the file, considering the current installation location. The method
type()
of bob.db.base.driver.Interface
should return sqlite.
A typical implementation for SQL-backend db packages is like this:
def files(self):
from pkg_resources import resource_filename
raw_files = ('db.sql3',)
return [resource_filename(__name__, k) for k in raw_files]
def type(self):
return 'sqlite'
Metafiles Shipped with the Python Package¶
If you’d like the file db.sql3
to be shipped to PyPI when you publish
your package, make sure to include db.sql3
in the package’s MANIFEST.in
file. Otherwise, you are not required to add this file to the package manifest.
Download Missing Files for Large Databases¶
If you declared extra metafiles with your driver’s files()
implementation,
it is possible to both store and retrieve metafiles from a central file server
running at Idiap (see http://www.idiap.ch/software/bob/databases/latest). All
metafiles of a package are wrapped into a single tar-ball and copied to the
server upon uploading. The reverse process takes place when downloading.
This mechanism allows third-parties to download sources from the version control repository and retrieve the metafiles.
To download and install metafiles for a package, do:
$ bob_dbmanage.py <database-name> download
For example, you can use the special database name all
, together with the
flag --missing
to download the missing metafiles of all installed databases
like this:
$ bob_dbmanage.py all download --missing
Low and High-Level Interfaces¶
Bob database interfaces come in two flavours:
Low-level interfaces allow developers to create programmatic APIs to access samples and metadata available with databases as they are distributed by their controllers. Examples of this are the Samples database in this package or APIs provided in any other db packages. The main objective of a low-level database interface is to provide access to all information provided with the database, without direct regards to the specific task it was originally conceived for. The reasoning behind this design choice lies in the fact that databases very often find second lives in different tasks than originally intended. By providing access to all information available from the raw dataset, a developer potentialises such (re-)use cases.
High-level interfaces allow developers to create programmatic APIs to bind low-level interfaces to frameworks that perform a specific function. Because each low-level databases should be created to export all available information, in some cases it is possible to re-use an existing db package as input to a different task than it was originally conceived for. Here are some examples:
- Re-use a database for emotion recognition to perform remote
photo-plethysmographic (see
bob.db.hci_tagging
) - Re-use a face recognition database to train a face detector
- Re-use a speaker recognition database to do speech recognition
High-level database interfaces are, therefore, very task specific and normally sit together with frameworks doing high-level experimental research. Examples of such frameworks are bob.bio.base (biometric recognition) and bob.pad.base (presentation attack vulnerability and detection). Checkout their user guides for more information on specific high-level implementations required by those tasks.
- Re-use a database for emotion recognition to perform remote
photo-plethysmographic (see