Supervised learning takes in both a set of input features and their corresponding labels to produce a model which can then be fed an unknown instance and produce a label for it.
Typical supervised learning models are SVMs and decision trees.
features = np.random.randn(100,20)
features[:50] *= 2
labels = np.repeat((0,1), 50)
classifier = milk.defaultclassifier()
model = classifier.train(features, labels)
new_label = model.apply(np.random.randn(20))
new_label2 = model.apply(np.random.randn(20)*2)
All learners have a train function which takes 2 at least arguments: - features : sequence of features - labels : sequence of labels
(They may take more parameters).
They return a model object, which has an apply function which takes a single input and returns its label.
Note that there are always two objects: the learned and the model and they are independent. Every time you call learner.train() you get a new model. This is different from the typical interface where you first call train() and later apply() (or equivalent names) on the same object. This is a better interface because the type system protects you against calling apply() on the wrong object and because it often the case that you want to learn several models with the same learner. The only disadvantage is that the word classifier can be used for both, so in the documentation, we always refer to models and classifiers.
Both learners and models are pickle()able.
The style of milk involves many small objects,each providing one step of the pipeline. For example:
To get this you can use:
classifier = ctransforms(
chkfinite(),
interval_normalise(),
featureselector(linear_independent_features),
sda_filter(),
gridsearch(one_against_one(svm.svm_to_binary(svm.svm_raw())),
params={
'C': 2.**np.arange(-9,5),
'kernel': [svm.rbf_kernel(2.**i) for i in np.arange(-7,4)],
}
))
As you can see, this is very flexible, but can be tedious. Therefore, milk provides the above as a single function call: defaultclassifier()