Boosting Strong Classifiers

Several tasks can be achieved by a boosted classifier:

  1. A univariate classification task assigns each sample \vec x one of two possible classes: {+1, -1}. In this implementation, class +1 is assigned when the (real-valued) outcome of the classifier is positive, or -1 otherwise.
  2. A multivariate classification task assigns each sample \vec x one of N possible classes: {C_1, C_2, \dots, C_N}.
In this implementation, an :math:`N-dimensional output vector \vec y = [y_1, y_2, ... y_n] is assigned for each class, and the class with the highest outcome is assigned: C_n with n = \arg \max_n y_n. To train the multi-variate classifier, target values for each training sample are assigned a +1 for the correct class, and a -1 for all other classes.
  3. A (multivariate) regression task tries to learn a function f(\vec x) = \vec y based on several training examples.

To achieve this goal, a strong classifier S is build out of a weighted list of I weak classifiers W_i:

S(\vec x) = \sum_{i=1}^I w_i \cdot W_i(\vec x)

Note

For the univariate case, both w_i and the weak classifier result W_i are floating point values. In the multivariate case, w_i is a vector of weights – one for each output dimension – and the weak classifier W_i returns a vector of floating point values as well.

Weak Classifiers

Currently, two types of weak classifiers are implemented in this boosting framework.

Stump classifier

The first classifier, which can only handle univariate classification tasks, is the bob.learn.boosting.StumpMachine. For a given input vector \vec x, the classifier bases its decision on a single element x_m of the input vector:

W(\vec x) = \left\{ \begin{array}{r@{\text{ if }}l} +1 & (x_m - \theta) * \phi >= 0 \\ -1 & (x_m - \theta) * \phi < 0 \end{array}\right.

Threshold \theta, polarity phi and index m are parameters of the classifier, which are trained using the bob.learn.boosting.StumpTrainer. For a given training set \{\vec x_p \mid p=1,\dots,P\} and according target values \{t_p \mid p=1,\dots,P\}, the threshold \theta_m is computed for each input index m, such that the lowest classification error is obtained, and the m with the lowest training classification error is taken. The polarity \phi is set to -1, if values lower than the threshold should be considered as positive examples, or to +1 otherwise.

To compute the classification error for a given \theta_m, the gradient of a loss function is taken into consideration. For the stump trainer, usually the bob.learn.boosting.ExponentialLoss is considered as the loss function.

Look-Up-Table classifier

The second classifier, which can handle univariate and multivariate classification and regression tasks, is the bob.learn.boosting.LUTMachine. This classifier is designed to handle input vectors with discrete values only. Again, the decision of the weak classifier is based on a single element of the input vector \vec x.

In the univariate case, for each of the possible discrete values of x_m, a decision {+1, -1} is selected:

W(\vec x) = LUT[x_m]

This look-up-table LUT and the feature index m is trained by the bob.learn.boosting.LUTTrainer.

In the multivariate case, each output W^o is handled independently, i.e., a separate look-up-table LUT^o and a separate feature index m^o is assigned for each output dimension o:

W^o(\vec x) = LUT^o[x_{m^o}]

Note

As a variant, the feature index m^o can be selected to be shared for all outputs, see bob.learn.boosting.LUTTrainer for details.

A weak look-up-table classifier is learned using the bob.learn.boosting.LUTTrainer.

Strong classifier

The strong classifier, which is of type bob.learn.boosting.BoostedMachine, is a weighted combination of weak classifiers, which are usually of the same type. It can be trained with the bob.learn.boosting.Boosting trainer, which takes a list of training samples, and a list of univariate or multivariate target vectors. In several rounds, the trainer computes (here, only the univariate case is considered, but the multivariate case is similar – simply replace scores by score vectors.):

  1. The classification results (the so-called scores) for the current strong classifier:

    s_p = S(\vec x_p)

  2. The derivative L' of the loss function, based on the current scores and the target values:

    \nabla_p = L'(t_p, s_p)

  3. This loss gradient is used to select a new weak machine W_i using a weak trainer (see above).

    W_i = trainer.train([\vec x_p], [\nabla_p])
    
  4. The scores of the weak machine are computed:

    r_p = W_i(\vec x_p)

  5. The weight for the new machine is optimized using scipy.optimize.fmin_l_bfgs_b. This call will use both the loss L and its derivative L' to compute the optimal weight for the new classifier:

    w_i = scipy.optimize.fmin_l_bfgs_b(...)
    
  6. The new weak machine is added to the strong classifier.

Loss functions

As shown above, the loss functions define, how well the currently predicted scores s_p fit to the target values t_p. Depending on the desired task, and on the type of classifier, different loss functions might be used:

  1. The bob.learn.boosting.ExponentialLoss can be used for the binary classification task, i.e., when target values are in {+1, -1}
  2. The bob.learn.boosting.LogitLoss can be used for the multi-variate classification task, i.e., when target vectors have entries from {+1, 0}
  3. The bob.learn.boosting.JesorskyLoss can be used for the particular multi-variate regression task of learning the locations of facial features.

Other loss functions, e.g., using the Euclidean distance for regression, should be easily implementable.