Using Random ForestsΒΆ

If you are not familiar with random forests, in general, Wikipedia is a good place to start reading. The current article deals only with how to use them in milk.

Random forests as implemented in milk are binary classifiers, so you need to use a transformer to turn them into multi-class learners if you have multi-class data.

from milk.supervised import randomforest
from milk.supervised.multi import one_against_one

rf_learner = randomforest.rf_learner()
learner = one_against_one(rf_learner)

This is just another learner type, which we can use to train a model:

from milksets import wine
features, labels = wine.load()
model = learner.train(features, labels)

or to perform cross-validation:

cmat,names, preds = milk.nfoldcrossvalidation(features, labels, classifier=learner, return_predictions=1)

If you have milksets installed, you can try it on one of its datasets:

from milksets import wine
features, labels = wine.load()
cmat,names, preds = milk.nfoldcrossvalidation(features, labels, classifier=learner, return_predictions=1)

We can finally plot the results (mapped to 2 dimensions using PCA):

from milk.supervised import randomforest
from milk.supervised.multi import one_against_one
import milk.nfoldcrossvalidation
import milk.unsupervised

import pylab
from milksets import wine

# Load 'wine' dataset
features, labels = wine.load()
# random forest learner
rf_learner = randomforest.rf_learner()
# rf is a binary learner, so we transform it into a multi-class classifier
learner = one_against_one(rf_learner)

# cross validate with this learner and return predictions on left-out elements
cmat,names, preds = milk.nfoldcrossvalidation(features, labels, classifier=learner, return_predictions=1)

print('cross-validation accuracy:', cmat.trace()/float(cmat.sum()))

# dimensionality reduction for display
x,v = milk.unsupervised.pca(features)
colors = "rgb" # predicted colour
marks = "xo" # whether the prediction was correct
for (y,x),p,r in zip(x[:,:2], preds, labels):
    c = colors[p]
    m = marks[p == r]
    pylab.plot(y,x,c+m)
pylab.show()

(Source code)

Colours indicate the classification output. A circle means that it matches the underlying label, a cross that it was a mis-classification.