7. API Reference

This contains the API of all the functions of mclearn. The content is the same as what you would get from the docstrings, namely the parameters and the return type of each function. If this is the first time you’re using mclearn, you might like to instead go through the User Guide for an in-depth explanation of the algorithms.

As a convenience, all functions listed below can be accessed directly from the top-level module.

7.1. Classifiers

Some general standard classifier routines for astronomical data.

train_classifier Standard classifier routine.
print_classification_result Train the specified classifier and print out the results.
learning_curve Compute the learning curve of a classiifer.
compute_all_learning_curves Compute the learning curves with the most popular classifiers.
grid_search A general grid search routine.
grid_search_svm_rbf Do a grid search on SVM wih an RBF kernel.
grid_search_svm_sigmoid Do a grid search on SVM wih a sigmoid kernel.
grid_search_svm_poly_degree Do a grid search on a Linear SVM given the specified polynomial transformation.
grid_search_svm_poly Do a grid search on SVM with polynomial transformation of the features.
grid_search_logistic_degree Do a grid search on Logistic Regression given the specified polynomial transformation.
grid_search_logistic Do a grid search on Logistic Regression.
predict_unlabelled_objects Predict the classes of unlabelled objects given a classifier.

7.2. Active Learner

The main routine of all active learning algorithms.

active_learn Conduct active learning and return a learning curve.
run_active_learning_with_heuristic Experiment routine with a partciular classifier heuristic.
active_learning_experiment Run an active learning experiment with specified heuristics.

7.3. Active Learning Heuristics

Heuristics used to query the most uncertain candidate out of the unlabelled pool.

random_h Return a random candidate.
entropy_h Return the candidate whose prediction vector displays the greatest Shannon entropy.
margin_h Return the candidate with the smallest margin.
qbb_margin_h Return the candidate with the smallest average margin.
qbb_kl_h Return the candidate with the largest average KL divergence from the mean.
compute_A Compute the A matrix in the variance estimation technique.
compute_F Compute the F matrix in the variance estimation technqiue.
compute_pool_variance Estimate the variance of the pool.
pool_variance_h Return the candidate that will minimise the expected variance of the predictions.
compute_pool_entropy Estimate the variance of the pool.
pool_entropy_h Return the candidate that will minimise the expected entropy of the predictions.

7.4. Performance Measures

Various measures that evaluate the performance of a classifier.

naive_accuracy Compute the naive accuracy rate.
get_beta_parameters Extract the beta parameters from a confusion matrix.
convolve_betas Convolves k Beta distributions.
balanced_accuracy_expected Compute the expected value of the posterior balanced accuracy.
beta_sum_pdf Compute the pdf of the sum of beta distributions.
beta_avg_pdf Compute the pdf of the average of the k beta distributions.
beta_sum_cdf Compute the cdf of the sum of the k beta distributions.
beta_avg_pdf Compute the pdf of the average of the k beta distributions.
beta_avg_inv_cdf Compute the inverse cdf of the average of the k beta distributions.
recall Compute the recall from a confusion matrix.
precision Compute the precision from a confusion matrix.
compute_balanced_accuracy Compute the accuracy of a classifier based on some test set.

7.5. Photometric Data

Procedures specific to photometric data.

reddening_correction_sfd98 Compute the reddening values using the SFD98 correction set.
reddening_correction_sf11 Compute the reddening values using the SF11 correction set.
reddening_correction_w14 Compute the reddening values using the W14 correction set.
correct_magnitudes Correct the values of magntidues given a correction set.
compute_colours Compute specified combinations of colours.
fetch_sloan_data Run an SQL query on the Sloan Sky Server.
fetch_filter Get a filter from the internet.
fetch_spectrum Get a spectrum from the internet.
clean_up_subclasses Clean up the names of the subclasses in the SDSS dataset.

7.6. Data Preprocessing

Useful general-purpose preprocessing functions.

normalise_z Normalise each feature to have zero mean and unit variance.
normalise_unit_var Normalise each feature to have unit variance.
normalise_01 Normalise each feature to unit interval.
draw_random_sample Split the data into a train set and test set of a given size.
balanced_train_test_split Split the data into a balanced training set and test set of some given size.
csv_to_hdf Convert csv files to a HDF5 table.

7.7. Visualisations

Selected plots commonly used in astronomy and active learning.

plot_class_distribution Plot the distribution of the classes.
plot_scores Make a barplot of the scores of some performance measure.
plot_balanced_accuracy_violin Make a violin plot of the balanced posterior accuracy.
plot_learning_curve Plot the learning curve.
plot_average_learning_curve Plot the average learning curve from many trials.
plot_hex_map Plot the density of objects on a hex map.
plot_recall_maps Plot the recall map.
plot_filters_and_spectrum Plot ugriz filters and spectrum in the same figure.
plot_scatter_with_classes Plot a scater plot of the classes.
reshape_grid_socres Reshape the scores to be used as input for the heathap.
plot_validation_accuracy_heatmap Plot heatmap of the validation accuracy from a grid search.