7.1.1. mclearn.classifier.train_classifier

mclearn.classifier.train_classifier(data, feature_names, class_name, train_size, test_size, output='', random_state=None, coords=True, recall_maps=True, classifier=None, correct_baseline=None, balanced=True, returns=['correct_boolean', 'confusion_test'], report=True, pickle_path=None)[source]

Standard classifier routine.

Parameters:
  • data (DataFrame) – The DataFrame containing all the data.
  • feature_names (array) – A list of column names in data that are used as features.
  • class_name (str) – The column name of the target.
  • train_size (int) – The size of the training set.
  • test_size (int) – The size of the test set.
  • output (str) – The name that will be attached to the path of the saved plots.
  • random_state (int) – The value of the random state (used for reproducibility).
  • coords (bool) – Whehter coordinates are part of the features.
  • recall_maps (bool) – Wheter to make a map of recall scores.
  • classifier (Classifier object) – An initialised scikit-learn Classifier object.
  • correct_baseline (array) – If we want to compare our results to some baseline, supply the default predicted data here.
  • balanced (bool) – Whether to make the training and test set balanced.
  • returns (array) – The list of variables to be retuned by the function.
  • report (bool) – Whether to print out the classification report.
  • pickle_path (str) – If a pickle path is supplied, the classifier will be saved in the specified location.
Returns:

  • correct_boolean (array) – The boolean array indicating which test exmaples were correctly predicted.
  • confusion_test (array) – The confusion matrix on the test examples.