7.2.2. mclearn.active.run_active_learning_with_heuristic¶
-
mclearn.active.
run_active_learning_with_heuristic
(heursitic, classifier, training_pool, testing_pool, training_oracle, testing_oracle, balanced_pool=False, full_sample_size=60000, n_trials=10, total_n=1000, initial_n=20, random_n=60000, committee=None, bag_size=10000, classes=['Galaxy', 'Star', 'Quasar'], C=None, pool_sample_size=300, pickle_path=None)[source]¶ Experiment routine with a partciular classifier heuristic.
Parameters: - heuristic (function) – This is the function that implements the active learning rule. Given a set of training candidates and the classifier as inputs, the function will return index array of candidate(s) with the highest score(s).
- classifier (Classifier object) – A classifier object that will be used to train and test the data. It should have the same interface as scikit-learn classifiers.
- training_pool (array, shape = [n_samples, n_features]) – The feature matrix of all the training examples. Throughout the training phase, the active learner will select an oject from this pool to query to oracle.
- testing_pool (array, shape = [n_samples, n_features]) – The feature matrix of the test examples, which will be used to assess the accuracy rate of the active learner.
- training_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the training examples.
- testing_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the test examples.
- balanced_pool (boolean) – Whether the class disribution in the training pool should be uniform.
- full_sample_size (int) – The size of the training pool in each trial of the experiment.
- n_trials (int) – The number trials the experiment will be run.
- total_n (int) – The total number of samples that the active learner will query.
- initial_n (int) – The number of samples that the active learner will randomly select at the beginning to get the algorithm started.
- random_n (int) – At each iteration, the active learner will pick a random of sample of examples. It will then compute a score for each of example and query the one with the highest score according to the active learning rule. If random_n is set to 0, the entire training pool will be sampled (which can be inefficient with large datasets).
- committee (list of Classifier object) – A list that contains the committee of classifiers used by the query by bagging heuristics.
- bag_size (int) – The number of training examples used by each member in the committee.
- classes (array) – The names of the targets.
- C (float) – The regularisation parameter of Logistic Regression.
- pickle_paths (array) – List of paths where the learning curves can be saved.
Returns: learning_curves – If no pickle paths are provided, the learning curves will be returned.
Return type: array