Infpy is a python package I have put together that implements some of the algorithms I (John Reid) have used in my research. In particular it has a Gaussian process package that is largely based on the excellent book, Gaussian Processes for Machine Learning by Rasmussen and Williams.
The Gaussian process package is the only infpy package that is extensively documented so far but you are welcome to try out the others. The Gaussian process package has the following attributes:
Code to handle bootstrap analyses.
Calculate the p-value for the statistic’s value given the bootstrap values.
Code to implement convergence tests (primarily for sequences of log likelihoods).
Bases: object
Tests for convergence of a series of log likelihoods.
The log likelihoods.
If true a warning is printed if the log likelihoods don’t always increase.
If true a warning is printed if the log likelihoods don’t always increase.
If true the absolute differences is used for the convergence test, otherwise any decrease is viewed as convergence.
Takes 2 numpy arrays of components of a log likelihood. Assumes the total LL is the sum of each array. Compares the 2 LLs and if the new one is smaller than the first by at least tolerance, raises an error.
Bases: infpy.distribution.Distribution
http://en.wikipedia.org/wiki/Gamma_distribution
Mean is k * theta Support is [0,inf)
Code to implement ROC point/curve calculation and plotting.
Bases: object
Calculates specificities and sensitivities from counts of true and false positives and negatives.
Source: wikipedia - Fawcett (2004)
(TP.TN-FN.FP)/sqrt((TP+FN)(TN+FP)(TP+FP)(TN+FN)) see: Burset & Guigo
Number of false negatives.
Number of false positives.
FP/(TN+FP)
TP/(TP+FN)
TP/(TP+FP)
TP/(TP+FN)
Number of true negatives.
Number of true positives.
TP/(TP+FN)
TP/(TP+FN)
Takes 2 sorted lists (smallest to largest): one list is of the thresholds required to classify the positive examples as positive and the other list is of the thresholds required to classify the negative examples as positive.
Returns: | Yields all the ROC points. Note that they are returned in the opposite order to some of the other methods in this module. |
---|
Parameters: |
|
---|---|
Returns: | The area under the ROC curve given by the ROC points. |
Calculate the AUC50 as in Gribskov & Robinson: ‘Use of ROC analysis to evaluate sequence pattern matching’
Calculate the AUC50 as in Gribskov & Robinson: ‘Use of ROC analysis to evaluate sequence pattern matching’
Calculate the AUC50 as in Gribskov & Robinson ‘Use of ROC analysis to evaluate sequence pattern matching’
Return the index into rocpoints for first rocpoint with predicate(rocpoint) is True and start <= index < end. Assumes rocpoints are sorted w.r.t. predicate.
Take a list of thresholds (in sorted order) and count how many would be classified positive and negative at the given value.
Returns: | (num_positive, num_negative). |
---|
Takes 2 sorted lists: one list is of the thresholds required to classify the positive examples as positive and the other list is of the thresholds required to classify the negative examples as positive.
Returns: | A list of tuples (ROC point, threshold). |
---|
Generate ROC points but sort negatives before positives at same threshold if asked to. This gives a step-function like ROC curve rather than a smoothed curve.
Takes a sequence of (parameter, roc) tuples and returns a new parameter that should be tested next.
It chooses this parameter by sorting the sequence and taking the mid-point between the parameters with the largest absolute difference between their specificities or sensitivities (depending on for_specificity parameter).
Label the x and y axes of a precision versus recall plot.
Returns: | A function that calculates a ROC point given a threshold. |
---|
Tries to pick thresholds to give a smooth ROC curve.
Returns: | A list of (roc point, threshold) tuples. |
---|
Takes 2 sorted lists: one list is of the thresholds required to classify the positive examples as positive and the other list is of the thresholds required to classify the negative examples as positive.
Returns: | A list of ROC points. |
---|
Plots a precision-recall curve for the given ROCs.
Parameters: |
|
---|---|
Returns: | The result of 2 pylab.plot calls as a tuple (recall, precision). |
Plots precision versus recall for the ROCs in rocs. Adds points at (0,1) and (1,0).
Parameters: |
|
---|---|
Returns: | The result of pylab.plot call. |
Draw a random classifier on a ROC plot. Black dashed line by default.
Plots TPR versus FPR for the ROCs in rocs. Adds points at (0,0) and (1,1).
Parameters: |
|
---|---|
Returns: | The result of pylab.plot call. |
Plot a single rocpoint. Typically used to indicate where the last point for the AUC50 calculation is.
Plots TPR versus FPR for the ROCs in rocpoints.
Parameters: |
|
---|---|
Returns: | The result of pylab.plot call. |
Reduce the positive and negative thresholds such that there are just 50 (or num_negative) negative examples. The positive thresholds are trimmed accordingly.
Yield the ROC points while the number of true negatives is less than max_tn.
Take lists of positive and negative thresholds (in sorted order) and calculate a ROC point for the given value.
Takes 2 sorted lists: one list is of the thresholds required to classify the positive examples as positive and the other list is of the thresholds required to classify the negative examples as positive.
Returns: | A list of ROC points. |
---|
Check the approximation to the gradient of the function matches the supplied gradient
f is a function fprime is a function describing the gradient of f
The gradient will be approximated by expansion of f around x and compared with the value of fprime at x
Raises error and prints message if matrices are not close
Generates K (training, validation) pairs from the items in X.
The validation iterables are a partition of X, and each validation iterable is of length len(X)/K. Each training iterable is the complement (within X) of the validation iterable, and so each training iterable is of length (K-1)*len(X)/K.
For example:
X = [i for i in xrange(97)]
for training, validation in k_fold_cross_validation(X, K=7):
for x in X: assert (x in training) ^ (x in validation), x
Simple test that A and B differ by at most eps in any position