Support Vector Machines and Trainers

A Support vector machine (SVM) [1] is a very popular supervised learning technique. Bob provides a bridge to LIBSVM which allows you to train such a machine and use it for classification. This section contains a tutorial on how to use Bob‘s Pythonic bindings to LIBSVM. It starts by introducing the support vector bob.learn.libsvm.Machine followed by the trainer usage.

Machines

The functionality of this bridge includes loading and saving SVM data files and machine models, which you can produce or download following the instructions found on LIBSVM‘s home page. Bob bindings to LIBSVM do not allow you to explicitly set the machine’s internal values. You must use the a trainer to create a machine first, as explained further down. Once you have followed those instructions, you can come back to this point and follow the remaining examples here.

Note

Our current svm object was trained with the file called heart_scale, distributed with LIBSVM and available here. This dataset proposes a binary classification problem (i.e., 2 classes of features to be discriminated). The number of features is 13.

Our extensions to LIBSVM also allows you to feed data through a bob.learn.libsvm.Machine using numpy.ndarray objects and collect results in that format. For the following lines, we assume you have available a bob.learn.libsvm.Machine named svm. (For this example, the variable svm was generated from the heart_scale dataset using the application svm-train with default parameters). The shape attribute, indicates how many features a machine from this module can input and how many it outputs (typically, just 1):

>>> svm.shape
(13, 1)

To run a single example through the SVM, just use the () operator:

>> svm(numpy.ones((13,), 'float64'))
1
>> svm(numpy.ones((10,13), 'float64'))
(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)

Visit the documentation for bob.learn.libsvm.Machine to find more information about these bindings and methods you can call on such a machine. Visit the documentation for bob.learn.libsvm.File for information on loading LIBSVM data files directly into python and producing numpy.ndarray objects.

Below is a quick example: Suppose the variable f contains an object of type bob.learn.libsvm.File. Then, you could read data (and labels) from the file like this:

>>> labels, data = f.read_all()
>>> data = numpy.vstack(data) #creates a single 2D array

Then you can throw the data into the svm machine you trained earlier like this:

>>> predicted_labels = svm(data)

Training

The training set for SVM’s consists of a list of 2D NumPy arrays, one for each class. The first dimension of each 2D NumPy array is the number of training samples for the given class and the second dimension is the dimensionality of the feature. For instance, let’s consider the following training set for a two class problem:

>>> pos = numpy.array([[1,-1,1], [0.5,-0.5,0.5], [0.75,-0.75,0.8]], 'float64')
>>> neg = numpy.array([[-1,1,-0.75], [-0.25,0.5,-0.8]], 'float64')
>>> data = [pos,neg]
>>> print(data) 

Note

Please note that in the above training set, the data is pre-scaled so features remain in the range between -1 and +1. libsvm, apparently, suggests you do that for all features. Our bindings to libsvm do not include scaling. If you want to implement that generically, please do it.

Then, an SVM [1] can be trained easily using the bob.learn.libsvm.Trainer class.

>>> trainer = bob.learn.libsvm.Trainer()
>>> machine = trainer.train(data) #ordering only affects labels

This returns a bob.learn.libsvm.Machine which can later be used for classification, as explained before.

>>> predicted_label = machine(numpy.array([1.,-1.,1.]))
>>> print(predicted_label)
[1]

The training procedure allows setting several different options. For instance, the default kernel is an RBF. If we would like a linear SVM instead, this can be set before calling the bob.learn.libsvm.Trainer.train() method.

>>> trainer.kernel_type = 'LINEAR'

One Class SVM

On the other hand, the package allows you to train a One Class Support Vector Machine. For training this kind of classifier take into account the following example.

>>> oc_pos = 0.4 * numpy.random.randn(100, 2).astype(numpy.float64)
>>> oc_data = [oc_pos]
>>> print(oc_data) 

As the above example, an SVM [1] for one class problem can be trained easily using the bob.learn.libsvm.Trainer class and selecting the appropiete machine_type (ONE_CLASS).

>>> oc_trainer = bob.learn.libsvm.Trainer(machine_type='ONE_CLASS')
>>> oc_machine = oc_trainer.train(oc_data)

Then, as explained before, a bob.learn.libsvm.Machine can be used for classify the new entries.

>>> oc_test = 0.4 * numpy.random.randn(20, 2).astype(numpy.float64)
>>> oc_outliers = numpy.random.uniform(low=-4, high=4, size=(20, 2)).astype(numpy.float64)
>>> predicted_label_oc_test = oc_machine(oc_test)
>>> predicted_label_oc_outliers = oc_machine(oc_outliers)
>>> print(predicted_label_oc_test) 
>>> print(predicted_label_oc_outliers) 

Acknowledgements

As a final note, if you decide to use our LIBSVM bindings for your publication, be sure to also cite:

@article{CC01a,
 author  = {Chang, Chih-Chung and Lin, Chih-Jen},
 title   = {{LIBSVM}: A library for support vector machines},
 journal = {ACM Transactions on Intelligent Systems and Technology},
 volume  = {2},
 issue   = {3},
 year    = {2011},
 pages   = {27:1--27:27},
 note    = {Software available at \url{http://www.csie.ntu.edu.tw/~cjlin/libsvm}}
}
[1](1, 2, 3) http://en.wikipedia.org/wiki/Support_vector_machine