Running Experiments using Specialized Scripts

Testing Configurations of an Algorithm

Sometimes, configurations of algorithms are highly dependent on the database or even the employed protocol. Additionally, configuration parameters depend on each other. The FaceRecLib provides a relatively simple set up that allows to test different configurations in the same task. For this, the ./bin/parameter_test.py script can be employed. This script executes a configurable series of experiments, which reuse data as far as possible.

The Configuration File

The most important parameter to the ./bin/parameter_test.py is the --configuration-file. In this configuration file it is specified, which parameters of which part of the algorithms will be tested. An example for a configuration file can be found in the test scripts: facereclib/tests/scripts/parameter_Test.py. The configuration file is a common python file, which can contain certain variables:

  1. preprocessor =
  2. feature_extractor =
  3. tool =
  4. replace =
  5. requirement =
  6. imports =

The variables from 1. to 3. usually contain constructor calls for classes of Preprocessors, Feature Extractors and Recognition Algorithms, but also registered Resources can be used. For any of the parameters of the classes, a placeholder can be put. By default, these place holders start with a # character, followed by a digit or character. The variables 1. to 3. can also be overridden by the command line options --preprocessing, --features and --tool of the ./bin/parameter_test.py script.

The replace variable has to be set as a dictionary. In it, you can define with which values your place holder key should be filled, and in which step of the tool chain execution this should happen. The steps are 'preprocessing', 'extraction', 'projection', 'enrollment' and 'scoring'. For each of the steps, it can be defined, which placeholder should be replaced by which values. To be able to differentiate the results later on, each of the replacement values is bound to a directory name. The final structure looks somewhat like that:

replace = {
    step1 : {
        '#a' : {
            'Dir_a1' : 'Value_a1',
            'Dir_a2' : 'Value_a2'
         },

        '#b' : {
            'Dir_b1' : 'Value_b1',
            'Dir_b2' : 'Value_b2'
        }
    },

    step2 : {
        '#c' : {
            'Dir_c1' : 'Value_c1',
            'Dir_c2' : 'Value_c2'
        }
    }
}

Of course, more than two values can be selected. Additionally, tuples of place holders can be defined, in which case always the full tuple will be replaced in one shot. Continuing the above example, it is possible to add:

...
    step3 : {
        ('#d','#e') : {
            'Dir_de1' : ('Value_d1', 'Value_e1'),
            'Dir_de2' : ('Value_d2', 'Value_e2')
        }
    }

Note that all possible combinations of the configuration parameters are tested, which might result in a huge number of executed experiments. Some combinations of parameters might not make any sense. In this case, a set of requirements on the parameters can be set, using the requirement variable. In the requirements, any string including any placeholder can be put that can be evaluated using pythons eval function:

requirement = ['#a > #b', '2*#c != #a', ...]

Finally, if any of the classes or variables need to import a certain python module (other than the facereclib), it needs to be declared in the imports variable. If you, e.g., test, which scipy distance function works best for your features, please add the imports (and don’t forget the facereclib in case you use its tools):

imports = ['scipy', 'facereclib']

A complete working example, where the image resolution and LGBPHS distance function are tested, can be found in facereclib/tests/scripts/parameter_Test.py.

Further Command Line Options

The ./bin/parameter_test.py script has a further set of command line options.

  • The --database and the --protocol define, which database and (optionally) which protocol should be used.
  • The --sub-directory is similar to the one in the ./bin/faceverify.py, see Required Command Line Arguments.
  • The --preprocessing, --features and --tool can be used to override the preprocessor, feature_extractor and tool fields in the configuration file (in which case the configuration file does not need to contain these variables).
  • The --grid option can select the SGE configuration (if not selected, all experiments will be run sequentially on the local machine).
  • The --preprocessed-data-directory can be used to select a directory of previously preprocessed data. This should not be used in combination with testing different preprocessing parameters.
  • The --grid-database-directory can be used to select another directory, where the submitted.sql3 files will be stored.
  • The --write-commands directory can be selected to write the executed commands into (this is useful in case some experiments fail and need to be rerun).
  • The --dry-run flag should always be used before the final execution to see if the experiment definition works as expected.
  • The --skip-when-existent flag will only exexute the experiments that have not yet finished (i.e., where the resulting score files are not produced yet).
  • Finally, additional options might be sent to the ./bin/faceverify.py script directly. These options might be put after a -- separation.

Evaluation of Results

To evaluate a series of experiments, a special script iterates through all the results and computes EER on the development set and HTER on the evaluation set, for both the nonorm and the ztnorm directories. Simply call:

$ ./bin/collect_results.py --directory [result-base-directory] --sort

This will iterate through all result files found in [result-base-directory] and sort the results according to the EER on the development set (the sorting criterion can be modified using the --criterion keyword).

Databases with Special Evaluation Protocols

Some databases provide special evaluation protocols which require a more complicated experiment design. For these databases, different scripts are provided. These databases are:

The LFW Database

For the Labeled Faces in the Wild (LFW) database, there is another script to calculate the experiments, strictly following the LFW protocols by computing the classification performance on view1 and/or view2. The final result of the LFW experiment is, hence, a text file (--result-file) containing the single results for view1 and the 10 folds fold1 ... fold10 of view2, as well as the final average and standard deviation of all folds. In principle, the ./bin/faceveryfy.py could be used as well, without having the classification performance.

The parameters of the ./bin/faceverify_lfw.py script are mostly similar to the ./bin/faceverify.py script as explained in Running non-baseline Experiments. A few exceptions are that the default database is lfw and the parts belonging to the ZT score normalization are missing. Additionally, instead of the --protocol option, the --views option is available, which by default executes only on view1.

The GBU Database

There is another script ./bin/faceverify_gbu.py that executes experiments on the Good, Bad, and Ugly (GBU) database. In principle, most of the parameters from above can be used. One violation is that instead of the --models-directories option is replaced by only --model-directory.

When running experiments on the GBU database, the default GBU protocol (as provided by NIST) is used. Hence, training is performed on the special Training set, and experiments are executed using the Target set as models (using a single image for model enrollment) and the Query set as probe.

The GBU protocol does not specify T-Norm-models or Z-Norm-probes, nor it splits off development and test set. Hence, only a single score file is generated, which might later on be converted into an ROC curve using Bob functions.