In this example the performance of SVM classifier is evaluated in a stratified k-fold resampling schema.
First, import NumPy and mlpy modules:
>>> import numpy as np
>>> import mlpy
Then, load a data file (data.dat) containing 30 samples described by 100 features (x) and labels (y):
>>> x, y = mlpy.data_fromfile('data.dat') # import data file
>>> x.shape
(30, 100)
Initialize SVM classifier, specifying kernel type (linear) and regularization parameter (C):
>>> classifier = mlpy.Svm(kernel = 'linear', C = 1.0) # initialize the svm classifier
Define a stratified 10-fold resampling schema, where idx contains the sample indexes (list of train/test pairs):
>>> idx = mlpy.kfoldS(cl = y, sets = 10)
Actually build train and test data. Train the model on xtr and test it on xts. The performance is evaluated computing the average prediction error:
>>> pred_err = 0.0
>>> for idxtr, idxts in idx:
... xtr, xts = x[idxtr], x[idxts] # build training data
... ytr, yts = y[idxtr], y[idxts] # build test data
... ret = classifier.compute(xtr, ytr) # compute the model
... pred = classifier.predict(xts) # test the model on test data
... pred_err += mlpy.err(yts, pred) # compute the prediction error
>>> av_pred_err = pred_err / len(idx) # compute the average prediction error
>>> av_pred_err
0.17499999999999999