7.2.3. mclearn.active.active_learning_experiment

mclearn.active.active_learning_experiment(data, feature_cols, target_col, classifier, heuristics, committee, pickle_paths, degree=1, n_trials=10, balanced_pool=False, C=None, pool_sample_size=300, random_n=60000)[source]

Run an active learning experiment with specified heuristics.

Parameters:
  • data (DataFrame) – The DataFrame containing all the features and target.
  • feature_cols (array) – The list of column names of the features.
  • target_col (array) – The name of the target column in the DataFrame.
  • classifier (Classifier object) – A classifier object that will be used to train and test the data. It should have the same interface as scikit-learn classifiers.
  • heuristics (array) – The list of heuristics to be experimented on. Each heuristic is a function that implements the active learning rule. Given a set of training candidates and the classifier as inputs, the function will return index array of candidate(s) with the highest score(s).
  • committee (list of Classifier objects) – A list that contains the committee of classifiers used by the query by bagging heuristics.
  • pickle_paths (array) – List of paths where the learning curves can be saved.
  • degree (int) – If greater than 1, the data will be transformed polynomially with the given degree.
  • n_trials (int) – The number trials the experiment will be run.
  • balanced_pool (boolean) – Whether the class disribution in the training pool should be uniform.
  • C (float) – The regularisation parameter of Logistic Regression.
  • random_n (int) – At each iteration, the active learner will pick a random of sample of examples. It will then compute a score for each of example and query the one with the highest score according to the active learning rule. If random_n is set to 0, the entire training pool will be sampled (which can be inefficient with large datasets).