7.2.1. mclearn.active.active_learn¶
-
mclearn.active.
active_learn
(training_pool, testing_pool, training_oracle, testing_oracle, total_n, initial_n, random_n, active_learning_heuristic, classifier, compute_accuracy, classes, committee=None, bag_size=None, C=None, pool_sample_size=300, verbose=False)[source]¶ Conduct active learning and return a learning curve.
Parameters: - training_pool (array, shape = [n_samples, n_features]) – The feature matrix of all the training examples. Throughout the training phase, the active learner will select an oject from this pool to query to oracle.
- testing_pool (array, shape = [n_samples, n_features]) – The feature matrix of the test examples, which will be used to assess the accuracy rate of the active learner.
- training_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the training examples.
- testing_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the test examples.
- total_n (int) – The total number of samples that the active learner will query.
- initial_n (int) – The number of samples that the active learner will randomly select at the beginning to get the algorithm started.
- random_n (int) – At each iteration, the active learner will pick a random of sample of examples. It will then compute a score for each of example and query the one with the highest score according to the active learning rule. If random_n is set to 0, the entire training pool will be sampled (which can be inefficient with large datasets).
- active_learning_heuristic (function) – This is the function that implements the active learning rule. Given a set of training candidates and the classifier as inputs, the function will return index array of candidate(s) with the highest score(s).
- classifier (Classifier object) – A classifier object that will be used to train and test the data. It should have the same interface as scikit-learn classifiers.
- compute_accuracy (function) – Given a trained classifier, a test set, and a test oracle, this function will return the accuracy rate.
- classes (int) – The name of classes.
- committee (list of Classifier object) – A list that contains the committee of classifiers used by the query by bagging heuristics.
- bag_size (int) – The number of training examples used by each member in the committee.
- C (float) – The regularisation parameter of Logistic Regression.
- pool_sample_size (int) – The size of the sample which will be used to estimate the variance/entropy.
- verbose (boolean) – If set to True, progress is printed to standard output after every 100 iterations.
Returns: learning_curve – Every time the active learner queries the oracle, it will re-train the classifier and run it on the test data to get an accuracy rate. The learning curve is simply the array containing all of these accuracy rates.
Return type: array