7.2.3. mclearn.active.active_learning_experiment¶
-
mclearn.active.
active_learning_experiment
(data, feature_cols, target_col, classifier, heuristics, committee, pickle_paths, degree=1, n_trials=10, balanced_pool=False, C=None, pool_sample_size=300, random_n=60000)[source]¶ Run an active learning experiment with specified heuristics.
Parameters: - data (DataFrame) – The DataFrame containing all the features and target.
- feature_cols (array) – The list of column names of the features.
- target_col (array) – The name of the target column in the DataFrame.
- classifier (Classifier object) – A classifier object that will be used to train and test the data. It should have the same interface as scikit-learn classifiers.
- heuristics (array) – The list of heuristics to be experimented on. Each heuristic is a function that implements the active learning rule. Given a set of training candidates and the classifier as inputs, the function will return index array of candidate(s) with the highest score(s).
- committee (list of Classifier objects) – A list that contains the committee of classifiers used by the query by bagging heuristics.
- pickle_paths (array) – List of paths where the learning curves can be saved.
- degree (int) – If greater than 1, the data will be transformed polynomially with the given degree.
- n_trials (int) – The number trials the experiment will be run.
- balanced_pool (boolean) – Whether the class disribution in the training pool should be uniform.
- C (float) – The regularisation parameter of Logistic Regression.
- random_n (int) – At each iteration, the active learner will pick a random of sample of examples. It will then compute a score for each of example and query the one with the highest score according to the active learning rule. If random_n is set to 0, the entire training pool will be sampled (which can be inefficient with large datasets).