Cross validation is one of the better ways to evaluate the performance of supervised classification.

Cross validation consists of separating the data into fold (hence the name _n_-fold cross-validation, where _n_ is a positive integer). For the purpose o this discussion, we consider 10 folds. In the first round, we leave the first fold out. This means we train on the other 9 folds and then evaluate the model on this left-out fold. On the second round, we leave the second fold out. This continues until every fold has been left out exactly once.

Milk support what is often explicitly called stratified cross validation, which means that it takes the class distributions into account (so that, in 10 fold cross validation, each fold will have 10% of each class per round).

An additional functionality, not normally found in machine learning packages or in machine learning theory, but very useful in practice is the use of the origins parameter. Every datapoint can have an associated origin. This is a an integer and its meaning is the following: all examples with the same origin will be in the same fold (so testing will never be performed where there was an object of the same origin used for training).

This can model cases such as the following: you have collected patient data, which includes both some health measurement and an outcome of interest (for example, how the patient was doing a year after the initial exam). You wish to evaluate a supervised classification algorithm for predicting outcomes. In particular, you wish for an estimate of how well the system would perform on patients in any location (you know that the data collection has some site effects, perhaps because each person runs the test a little bit differently). Fortunately, you have the data to test this: the patients come from several clinics. Now, you set each patient origin to be the ID of the clinic and evaluate the per patient accuracy.

API Documentation

milk.measures.nfoldcrossvalidation.foldgenerator(labels, nfolds=None, origins=None, folds=None, multi_label=False)
for train,test in foldgenerator(labels, nfolds=None, origins=None)

This generator breaks up the data into n folds (default 10).

If origins is given, then all elements that share the same origin will either be in testing or in training (never in both). This is useful when you have several replicates that shouldn’t be mixed together between training&testing but that can be otherwise be treated as independent for learning.


labels : a sequence

the labels

nfolds : integer

nr of folds (default 10 or minimum label size)

origins : sequence, optional

if present, must be an array of indices of the same size as labels.

folds : sequence of int, optional

which folds to generate


iterator over train, test, two boolean arrays

milk.measures.nfoldcrossvalidation.getfold(labels, fold, nfolds=None, origins=None)

Get the training and testing set for fold fold in nfolds

Arguments are the same as for foldgenerator


labels : ndarray of labels

fold : integer

nfolds : integer

number of folds (default 10 or size of smallest class)

origins : sequence, optional

if given, then objects with same origin are not scattered across folds

milk.measures.nfoldcrossvalidation.nfoldcrossvalidation(features, labels, nfolds=None, learner=None, origins=None, return_predictions=False, folds=None, initial_measure=0, classifier=None)

Perform n-fold cross validation

cmatrix,names = nfoldcrossvalidation(features, labels, nfolds=10, learner={defaultclassifier()}, origins=None, return_predictions=False) cmatrix,names,predictions = nfoldcrossvalidation(features, labels, nfolds=10, learner={defaultclassifier()}, origins=None, return_predictions=True)

cmatrix will be a N x N matrix, where N is the number of classes

cmatrix[i,j] will be the number of times that an element of class i was classified as class j

names[i] will correspond to the label name of class i


features : a sequence

labels : an array of labels, where label[i] is the label corresponding to features[i]

nfolds : integer, optional

Nr of folds. Default: 10

learner : learner object, optional

learner should implement the train() method to return a model (something with an apply() method). defaultclassifier() by default This parameter used to be called classifier and that name is still supported

origins : sequence, optional

Origin ID (see foldgenerator)

return_predictions : bool, optional

whether to return predictions (default: False)

folds : sequence of int, optional

which folds to generate

initial_measure : any, optional

what initial value to use for the results reduction (default: 0)


cmatrix : ndarray

confusion matrix

names : sequence

sequence of labels so that cmatrix[i,j] corresponds to names[i], names[j]

predictions : sequence

predicted output for each element