7.2.1. mclearn.active.active_learn

mclearn.active.active_learn(training_pool, testing_pool, training_oracle, testing_oracle, total_n, initial_n, random_n, active_learning_heuristic, classifier, compute_accuracy, classes, committee=None, bag_size=None, C=None, pool_sample_size=300, verbose=False)[source]

Conduct active learning and return a learning curve.

Parameters:
  • training_pool (array, shape = [n_samples, n_features]) – The feature matrix of all the training examples. Throughout the training phase, the active learner will select an oject from this pool to query to oracle.
  • testing_pool (array, shape = [n_samples, n_features]) – The feature matrix of the test examples, which will be used to assess the accuracy rate of the active learner.
  • training_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the training examples.
  • testing_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the test examples.
  • total_n (int) – The total number of samples that the active learner will query.
  • initial_n (int) – The number of samples that the active learner will randomly select at the beginning to get the algorithm started.
  • random_n (int) – At each iteration, the active learner will pick a random of sample of examples. It will then compute a score for each of example and query the one with the highest score according to the active learning rule. If random_n is set to 0, the entire training pool will be sampled (which can be inefficient with large datasets).
  • active_learning_heuristic (function) – This is the function that implements the active learning rule. Given a set of training candidates and the classifier as inputs, the function will return index array of candidate(s) with the highest score(s).
  • classifier (Classifier object) – A classifier object that will be used to train and test the data. It should have the same interface as scikit-learn classifiers.
  • compute_accuracy (function) – Given a trained classifier, a test set, and a test oracle, this function will return the accuracy rate.
  • classes (int) – The name of classes.
  • committee (list of Classifier object) – A list that contains the committee of classifiers used by the query by bagging heuristics.
  • bag_size (int) – The number of training examples used by each member in the committee.
  • C (float) – The regularisation parameter of Logistic Regression.
  • pool_sample_size (int) – The size of the sample which will be used to estimate the variance/entropy.
  • verbose (boolean) – If set to True, progress is printed to standard output after every 100 iterations.
Returns:

learning_curve – Every time the active learner queries the oracle, it will re-train the classifier and run it on the test data to get an accuracy rate. The learning curve is simply the array containing all of these accuracy rates.

Return type:

array