Supervised Classification¶

Supervised learning takes in both a set of input features and their corresponding labels to produce a model which can then be fed an unknown instance and produce a label for it.

Typical supervised learning models are SVMs and decision trees.

Example¶

features = np.random.randn(100,20)
features[:50] *= 2
labels = np.repeat((0,1), 50)

classifier = milk.defaultclassifier()
model = classifier.train(features, labels)
new_label = model.apply(np.random.randn(20))
new_label2 = model.apply(np.random.randn(20)*2)

Learners¶

All learners have a train function which takes 2 at least arguments: - features : sequence of features - labels : sequence of labels

(They may take more parameters).

They return a model object, which has an apply function which takes a single input and returns its label.

Note that there are always two objects: the learned and the model and they are independent. Every time you call learner.train() you get a new model. This is different from the typical interface where you first call train() and later apply() (or equivalent names) on the same object. This is a better interface because the type system protects you against calling apply() on the wrong object and because it often the case that you want to learn several models with the same learner. The only disadvantage is that the word classifier can be used for both, so in the documentation, we always refer to models and classifiers.

Both learners and models are pickle()able.

Composition and Defaults¶

The style of milk involves many small objects,each providing one step of the pipeline. For example:

remove NaNs and Infs from features
bring features to the [-1, 1] interval
feature selection by removing linearly dependent features and then SDA
one-vs-rest classifier based on a grid search for parameters for an svm classifier

To get this you can use:

classifier = ctransforms(
                chkfinite(),
                interval_normalise(),
                featureselector(linear_independent_features),
                sda_filter(),
                gridsearch(one_against_one(svm.svm_to_binary(svm.svm_raw())),
                            params={
                                'C': 2.**np.arange(-9,5),
                                'kernel': [svm.rbf_kernel(2.**i) for i in np.arange(-7,4)],
                            }
                            ))

As you can see, this is very flexible, but can be tedious. Therefore, milk provides the above as a single function call: defaultclassifier()

supervised Submodules¶

defaultclassifier: contains a default “good enough” classifier
svm: related to SVMs
adaboost: Adaboost
randomforest: random forests
grouped: contains objects to transform single object learners into group learners by voting
multi: transforms binary learners into multi-class learners (1-vs-1 or 1-vs-rest)
featureselection: feature selection
knn: k-nearest neighbours
tree: decision tree learners

Supervised Classification¶

Example¶

Learners¶

Composition and Defaults¶

supervised Submodules¶

Table Of Contents

Previous topic

Next topic

This Page