Support Vector Machines and Trainers¶
A Support vector machine (SVM) [1] is a very popular supervised learning
technique. Bob provides a bridge to LIBSVM which allows you to train
such a machine and use it for classification. This section contains a
tutorial on how to use Bob‘s Pythonic bindings to LIBSVM. It starts by
introducing the support vector bob.learn.libsvm.Machine
followed
by the trainer usage.
Machines¶
The functionality of this bridge includes loading and saving SVM data files and machine models, which you can produce or download following the instructions found on LIBSVM‘s home page. Bob bindings to LIBSVM do not allow you to explicitly set the machine’s internal values. You must use the a trainer to create a machine first, as explained further down. Once you have followed those instructions, you can come back to this point and follow the remaining examples here.
Note
Our current svm
object was trained with the file called heart_scale
,
distributed with LIBSVM and available here.
This dataset proposes a binary classification problem (i.e., 2 classes of
features to be discriminated). The number of features is 13.
Our extensions to LIBSVM also allows you to feed data through a
bob.learn.libsvm.Machine
using numpy.ndarray
objects
and collect results in that format. For the following lines, we assume you have
available a bob.learn.libsvm.Machine
named svm
. (For this
example, the variable svm
was generated from the heart_scale
dataset
using the application svm-train
with default parameters). The shape
attribute, indicates how many features a machine from this module can input and
how many it outputs (typically, just 1):
>>> svm.shape
(13, 1)
To run a single example through the SVM, just use the ()
operator:
>> svm(numpy.ones((13,), 'float64'))
1
>> svm(numpy.ones((10,13), 'float64'))
(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
Visit the documentation for bob.learn.libsvm.Machine
to find more
information about these bindings and methods you can call on such a machine.
Visit the documentation for bob.learn.libsvm.File
for information
on loading LIBSVM data files directly into python and producing
numpy.ndarray
objects.
Below is a quick example: Suppose the variable f
contains an object of
type bob.learn.libsvm.File
. Then, you could read data (and labels)
from the file like this:
>>> labels, data = f.read_all()
>>> data = numpy.vstack(data) #creates a single 2D array
Then you can throw the data into the svm
machine you trained earlier like
this:
>>> predicted_labels = svm(data)
Training¶
The training set for SVM’s consists of a list of 2D NumPy arrays, one for each class. The first dimension of each 2D NumPy array is the number of training samples for the given class and the second dimension is the dimensionality of the feature. For instance, let’s consider the following training set for a two class problem:
>>> pos = numpy.array([[1,-1,1], [0.5,-0.5,0.5], [0.75,-0.75,0.8]], 'float64')
>>> neg = numpy.array([[-1,1,-0.75], [-0.25,0.5,-0.8]], 'float64')
>>> data = [pos,neg]
>>> print(data)
Note
Please note that in the above training set, the data is pre-scaled so features remain in the range between -1 and +1. libsvm, apparently, suggests you do that for all features. Our bindings to libsvm do not include scaling. If you want to implement that generically, please do it.
Then, an SVM [1] can be trained easily using the
bob.learn.libsvm.Trainer
class.
>>> trainer = bob.learn.libsvm.Trainer()
>>> machine = trainer.train(data) #ordering only affects labels
This returns a bob.learn.libsvm.Machine
which can later be used
for classification, as explained before.
>>> predicted_label = machine(numpy.array([1.,-1.,1.]))
>>> print(predicted_label)
[1]
The training procedure allows setting several different options. For
instance, the default kernel is an RBF. If we would like a linear SVM
instead, this can be set before calling the
bob.learn.libsvm.Trainer.train()
method.
>>> trainer.kernel_type = 'LINEAR'
One Class SVM¶
On the other hand, the package allows you to train a One Class Support Vector Machine. For training this kind of classifier take into account the following example.
>>> oc_pos = 0.4 * numpy.random.randn(100, 2).astype(numpy.float64)
>>> oc_data = [oc_pos]
>>> print(oc_data)
As the above example, an SVM [1] for one class problem can be trained easily using the
bob.learn.libsvm.Trainer
class and selecting the appropiete machine_type (ONE_CLASS).
>>> oc_trainer = bob.learn.libsvm.Trainer(machine_type='ONE_CLASS')
>>> oc_machine = oc_trainer.train(oc_data)
Then, as explained before, a bob.learn.libsvm.Machine
can be used for classify the new entries.
>>> oc_test = 0.4 * numpy.random.randn(20, 2).astype(numpy.float64)
>>> oc_outliers = numpy.random.uniform(low=-4, high=4, size=(20, 2)).astype(numpy.float64)
>>> predicted_label_oc_test = oc_machine(oc_test)
>>> predicted_label_oc_outliers = oc_machine(oc_outliers)
>>> print(predicted_label_oc_test)
>>> print(predicted_label_oc_outliers)
Acknowledgements¶
As a final note, if you decide to use our LIBSVM bindings for your publication, be sure to also cite:
@article{CC01a,
author = {Chang, Chih-Chung and Lin, Chih-Jen},
title = {{LIBSVM}: A library for support vector machines},
journal = {ACM Transactions on Intelligent Systems and Technology},
volume = {2},
issue = {3},
year = {2011},
pages = {27:1--27:27},
note = {Software available at \url{http://www.csie.ntu.edu.tw/~cjlin/libsvm}}
}
[1] | (1, 2, 3) http://en.wikipedia.org/wiki/Support_vector_machine |