Previous topic

util.misc

Next topic

util.svm

This Page

util.stats

Functions related to computing statistics on a set of data.

glimpse.util.stats.CalculateRoc(target_labels, predicted_labels)

Calculate the points of the ROC curve from a set of labels and evaluations of a classifier.

Uses the single-pass efficient algorithm of Fawcett (2006). This assumes a binary classification task.

Parameters:
  • target_labels (1D ndarray of float) – Ground-truth label for each instance.
  • predicted_labels (1D ndarray of float) – Predicted label for each instance.
Returns:

Points on the ROC curve

Return type:

1D ndarray of float

See also

sklearn.metrics.roc_curve()

glimpse.util.stats.CalculateRocScore(target_labels, predicted_labels)

Calculate area under the ROC curve (AUC) from a set of target labels and predicted labels of a classifier.

Parameters:
  • target_labels (1D ndarray of float) – Ground-truth label for each instance.
  • predicted_labels (1D ndarray of float) – Predicted label for each instance.
Returns:

AUC value

Return type:

float

See also

sklearn.metrics.auc()

glimpse.util.stats.DPrime(true_positive_rate, false_positive_rate)

Calculate discriminability (d’) measure.

Parameters:
  • true_positive_rate (float) – Normalized TP rate.
  • false_positive_rate (float) – Normalized FP rate.
Return type:

float

glimpse.util.stats.Pca(X)

Compute the Principal Component Analysis (PCA) transformation for a dataset.

The first k rows of the transformation correspond to a projection onto a k-dimensional surface, chosen such that the L2 approximation error on the training set is minimized. This transformation is given as:

\[y = A'(x - \mu),\]

where \(\mu\) is the mean of the training data, and A has columns given by the eigenvectors of the training data’s covariance matrix. Eigenvectors are sorted by descending eigenvalue, which provides that the first k rows of the transformation are the optimal linear transformation under L2 approximation error. Returns the transform, and the standard deviation for each axis of the training data.

Usage:

>>> T, S = Pca(X)

where X is a matrix of training data, T is the transformation matrix, S is the array of standard devations. To transform a data point W given as an array, use:

>>> Y = numpy.dot(T, W)
Parameters:X (2D ndarray of float) – Input data, with variables given by columns and observations given by rows.

See also

This function was adapted from similar code by Jan Erik Solem. Also see sklearn.decomposition.PCA().