Supervised Classification

Supervised learning takes in both a set of input features and their corresponding labels to produce a model which can then be fed an unknown instance and produce a label for it.

Typical supervised learning models are SVMs and decision trees.

Example

features = np.random.randn(100,20)
features[:50] *= 2
labels = np.repeat((0,1), 50)

classifier = milk.defaultclassifier()
model = classifier.train(features, labels)
new_label = model.apply(np.random.randn(20))
new_label2 = model.apply(np.random.randn(20)*2)

Learners

All learners have a train function which takes 2 at least arguments: - features : sequence of features - labels : sequence of labels

(They may take more parameters).

They return a model object, which has an apply function which takes a single input and returns its label.

Note that there are always two objects: the learned and the model and they are independent. Every time you call learner.train() you get a new model. This is different from the typical interface where you first call train() and later apply() (or equivalent names) on the same object. This is a better interface because the type system protects you against calling apply() on the wrong object and because it often the case that you want to learn several models with the same learner. The only disadvantage is that the word classifier can be used for both, so in the documentation, we always refer to models and classifiers.

Both learners and models are pickle()able.

Composition and Defaults

The style of milk involves many small objects,each providing one step of the pipeline. For example:

  1. remove NaNs and Infs from features
  2. bring features to the [-1, 1] interval
  3. feature selection by removing linearly dependent features and then SDA
  4. one-vs-rest classifier based on a grid search for parameters for an svm classifier

To get this you can use:

classifier = ctransforms(
                chkfinite(),
                interval_normalise(),
                featureselector(linear_independent_features),
                sda_filter(),
                gridsearch(one_against_one(svm.svm_to_binary(svm.svm_raw())),
                            params={
                                'C': 2.**np.arange(-9,5),
                                'kernel': [svm.rbf_kernel(2.**i) for i in np.arange(-7,4)],
                            }
                            ))

As you can see, this is very flexible, but can be tedious. Therefore, milk provides the above as a single function call: defaultclassifier()

supervised Submodules

  • defaultclassifier: contains a default “good enough” classifier
  • svm: related to SVMs
  • adaboost: Adaboost
  • randomforest: random forests
  • grouped: contains objects to transform single object learners into group learners by voting
  • multi: transforms binary learners into multi-class learners (1-vs-1 or 1-vs-rest)
  • featureselection: feature selection
  • knn: k-nearest neighbours
  • tree: decision tree learners