7.2.1. mclearn.active.active_learn¶

mclearn.active.active_learn(training_pool, testing_pool, training_oracle, testing_oracle, total_n, initial_n, random_n, active_learning_heuristic, classifier, compute_accuracy, classes, committee=None, bag_size=None, C=None, pool_sample_size=300, verbose=False)[source]¶

Conduct active learning and return a learning curve.

Parameters:	training_pool (array, shape = [n_samples, n_features]) – The feature matrix of all the training examples. Throughout the training phase, the active learner will select an oject from this pool to query to oracle. testing_pool (array, shape = [n_samples, n_features]) – The feature matrix of the test examples, which will be used to assess the accuracy rate of the active learner. training_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the training examples. testing_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the test examples. total_n (int) – The total number of samples that the active learner will query. initial_n (int) – The number of samples that the active learner will randomly select at the beginning to get the algorithm started. random_n (int) – At each iteration, the active learner will pick a random of sample of examples. It will then compute a score for each of example and query the one with the highest score according to the active learning rule. If random_n is set to 0, the entire training pool will be sampled (which can be inefficient with large datasets). active_learning_heuristic (function) – This is the function that implements the active learning rule. Given a set of training candidates and the classifier as inputs, the function will return index array of candidate(s) with the highest score(s). classifier (Classifier object) – A classifier object that will be used to train and test the data. It should have the same interface as scikit-learn classifiers. compute_accuracy (function) – Given a trained classifier, a test set, and a test oracle, this function will return the accuracy rate. classes (int) – The name of classes. committee (list of Classifier object) – A list that contains the committee of classifiers used by the query by bagging heuristics. bag_size (int) – The number of training examples used by each member in the committee. C (float) – The regularisation parameter of Logistic Regression. pool_sample_size (int) – The size of the sample which will be used to estimate the variance/entropy. verbose (boolean) – If set to True, progress is printed to standard output after every 100 iterations.
Returns:	learning_curve – Every time the active learner queries the oracle, it will re-train the classifier and run it on the test data to get an accuracy rate. The learning curve is simply the array containing all of these accuracy rates.
Return type:	array

Table Of Contents

Search

7.2.1. mclearn.active.active_learn¶