7.2.2. mclearn.active.run_active_learning_with_heuristic¶

mclearn.active.run_active_learning_with_heuristic(heursitic, classifier, training_pool, testing_pool, training_oracle, testing_oracle, balanced_pool=False, full_sample_size=60000, n_trials=10, total_n=1000, initial_n=20, random_n=60000, committee=None, bag_size=10000, classes=['Galaxy', 'Star', 'Quasar'], C=None, pool_sample_size=300, pickle_path=None)[source]¶

Experiment routine with a partciular classifier heuristic.

Parameters:	heuristic (function) – This is the function that implements the active learning rule. Given a set of training candidates and the classifier as inputs, the function will return index array of candidate(s) with the highest score(s). classifier (Classifier object) – A classifier object that will be used to train and test the data. It should have the same interface as scikit-learn classifiers. training_pool (array, shape = [n_samples, n_features]) – The feature matrix of all the training examples. Throughout the training phase, the active learner will select an oject from this pool to query to oracle. testing_pool (array, shape = [n_samples, n_features]) – The feature matrix of the test examples, which will be used to assess the accuracy rate of the active learner. training_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the training examples. testing_oracle (array, shape = [n_samples]) – The array of class labels corresponding to the test examples. balanced_pool (boolean) – Whether the class disribution in the training pool should be uniform. full_sample_size (int) – The size of the training pool in each trial of the experiment. n_trials (int) – The number trials the experiment will be run. total_n (int) – The total number of samples that the active learner will query. initial_n (int) – The number of samples that the active learner will randomly select at the beginning to get the algorithm started. random_n (int) – At each iteration, the active learner will pick a random of sample of examples. It will then compute a score for each of example and query the one with the highest score according to the active learning rule. If random_n is set to 0, the entire training pool will be sampled (which can be inefficient with large datasets). committee (list of Classifier object) – A list that contains the committee of classifiers used by the query by bagging heuristics. bag_size (int) – The number of training examples used by each member in the committee. classes (array) – The names of the targets. C (float) – The regularisation parameter of Logistic Regression. pickle_paths (array) – List of paths where the learning curves can be saved.
Returns:	learning_curves – If no pickle paths are provided, the learning curves will be returned.
Return type:	array

Table Of Contents

Search

7.2.2. mclearn.active.run_active_learning_with_heuristic¶