.. Andre Anjos <andre.anjos@idiap.ch>
.. vim: set fileencoding=utf-8 :
.. Mon 11 Apr 2011 10:07:37 CEST

===============
 Score Toolkit
===============

The Toolkit is conceived for these purposes:

1. Plot the DET curve for a particular system
2. Check the consistency between score files w.r.t. the filenames scores refer
   to

.. _input-section:

Installation
------------

To install from the command line on a machine you have access to the python
installation tree (e.g., on a Windows machine):

.. code-block:: sh

  $ easy_install trstk

  # or

  $ pip install trstk

If you don't have adminstrative rights on the Python installation directory,
you can create an isolated virtual environment using `virtualenv`. Follow
instructions there to download and create a virtual environment and then either
`easy_install` or `pip install` this package.

Our PyPI page also contains a link to a Windows graphical installer.
Unfortunately, it does not install the package dependencies like the command
line installer does. You have to do it yourself. Here is the dependencies list:

* `NumPy`_
* `Matplotlib`_

Visit those webpages for more information.

Input
-----

Tools in this package accept score files in one single textual format. Each
line in the file refers to one single sample in the database being analyzed.
Each line is composed of 4 fields separated by spaces in this order:

1. Claimed identity: a string that defines the claimed identity of the subject
   being analyzed
2. Model label: contains a label/reference to the data used to make the model
   (filename <id>d<capture_number> used to make the model)
3. Real identity: a string that defines the real identity of the subject
   being analyzed (i.e. the output of the classification)
4. Test label: contains a label/reference to the data used to do the
   testing (filename <id>d<capture_number> of the test file)
5. Score: a floating-point value representing the score

Each of the above-mentioned fields **cannot have spaces in between**. Failing
to comply will make the tools emit syntax errors pointing to the location in
the file where problems seem to occur.

Here is a valid example score file:

.. code-block:: text

  02463 02463d547 02463 02463d653 0.623265
  02463 02463d547 02463 02463d655 0.920861
  02463 02463d547 02463 02463d657 0.938942
  02463 02463d547 02463 02463d659 0.743715
  02463 02463d547 02463 02463d661 0.397660
  02463 02463d547 02463 02463d663 0.615722
  02463 02463d547 02463 02463d665 0.613291
  02463 02463d547 02463 02463d667 0.543184
  02463 02463d547 02463 02463d669 0.829777
  02463 02463d547 02463 02463d671 0.869681
  02463 02463d547 02463 02463d673 0.806394
  02463 02463d547 02463 02463d675 1.007791
  02463 02463d547 04200 04200d75 0.257423

Here is an invalid example score file:

.. code-block:: text
  :linenos:

  Bob_Jones bob-file-001 Bob_Jones bob-file-004 -37.643410
  Susan Smith susan-file-001 Susan Smith susan-file-001 -33.393433
  Joe joe-file-030 Joe joe-file-001 -72.295616

In this case, line 2 above will fail because the real identity field and the
claimed identity fields contain spaces. Lines 1 and 3 do conform to the
proposed scheme and will be parsed without problems.

Multi-modality Input
====================

If you have multiple modalities you should build a single text file along the
lines explained before, for each modality. The order of the *tags* within each
file should be respected. Example *Hypothetical face verification experiment
output*:

.. code-block:: text

  02463 02463d547 02463 02463d675 1.007791
  02463 02463d547 04200 04200d75 0.257423
  02463 02463d547 04201 04201d435 0.315074
  02463 02463d547 04201 04201d437 0.347413
  02463 02463d547 04201 04201d439 0.296383
  02463 02463d547 04201 04201d443 0.371881
  02463 02463d547 04201 04201d445 0.260964

*Hypothetical speech verification experiment output*:

.. code-block:: text

  02463 02463d547 02463 02463d675 0.9932 
  02463 02463d547 04200 04200d75  0.0027
  02463 02463d547 04201 04201d435 0.0144
  02463 02463d547 04201 04201d437 0.0159
  02463 02463d547 04201 04201d439 0.1250
  02463 02463d547 04201 04201d443 0.0031
  02463 02463d547 04201 04201d445 0.0002

A set of working examples is included in the ``example`` directory of this
package.

.. _dependence-section:

Dependencies
------------

To properly run the software in this package you must have the following
packages installed:

* `Python`_: is the scripting language used for the programs
* `Matplotlib`_: is used for plotting
* `Sphinx`_: if you need to *recompile* the documentation

.. _usage-section:

Usage
-----

We describe a few scenarios for using the Toolkit in specific cases. In Section
:ref:`api-section` we exemplify how to create your own scripts that can re-use
the readout functionality available in the kit.

Example 1: Plotting a DET Curve
===============================

The following command will plot a single DET curve for a given input score file:

.. code-block:: sh

  $ plotDET.py test.scores

This command should produce a single plot in PDF file named ``det.pdf``
calculated using the contents of the input score file ``test.scores``. The plot
title will be empty. You can change the output filename and its type (we
support either `.png` files or `.jpg`) or add a plot title like this:

.. code-block:: sh
  
  $ plotDET.py --title="My Test DET" --output=test.png test.scores

You can plot a series of overlayed DET curves in the following manner:

.. code-block:: sh

  $ plotDET.py --title="My Test DET" --output=overlayed.pdf \
      --label=devel development.scores --label=test test.scores

This command will produce a single plot in a PDF file, with the overlayed DET
curves generated using each of the score files given as input parameters. A
legend will be drawn at a convenient location in the plot using the labels for
each of the curves as determined by your input. By default the program
generates black-and-white plots, but can be instructed to produce coloured
plots using the ``--colour`` option (see ``plotDET.py --help`` message).

Example 2: Checking score set consistency
=========================================

You can check the consistency between two (or more) score sets that are
supposed to provide scores for multiple biometric modalities using the
``checkModalities.py`` script. This tool will compare two input files and will
stop on the first error it finds:

.. code-block:: sh

  $ checkModalities.py faceverif.scores speechverif.scores

If you sort all files before calling the program, huge score files can be
checked in a much faster way as we will avoid the sorting step within the
program. You can do this using the ``sort`` and ``uniq`` unix utilities to sort
all score files before using ``checkModalities.py`` like this:

.. code-block:: sh

  $ sort my-scores.txt | uniq > sorted-scores.txt
  $ sort other-scores.txt | uniq > other-sorted-scores.txt
  $ checkModalities.py --sorted sorted-scores.txt other-sorted-scores.txt

.. Place your references here:
.. _Python: http://www.python.org
.. _Matplotlib: http://matplotlib.sourceforge.net
.. _Sphinx: http://sphinx.pocoo.org
.. _PyPI: http://pypi.python.org/pypi
.. _NumPy: http://numpy.scipy.org/