The Epydocs generated docs can be found here.
A convenience function which appropriately computes the BEDROC score (and associated curves and areas). Output is a dictionary with the relevant data appropriately labeled. For usage examples, please see the croc_bedroc script.
Create a stepped AC curve.
A class that encodes, left to right, a monotonically increasing parametric curve.
Append an x,y coordinate pair to this curve performing basic error checking to ensure resulting curve is monotonically increasing and that vertically and horizontolly colinear points are removed, and duplicate points are removed.
>>> C = Curve()
>>> C.append(0, 0) #initialize first coordinate to the origin
Adding a coordinate that decreases either the x or y component will throw an exception.
>>> C.append(-1,0)
Traceback (most recent call last):
...
AssertionError
>>> C.append(0,-1)
Traceback (most recent call last):
...
AssertionError
Adding a duplicate coordinate will do nothing.
>>> C
Curve([(0.0, 0.0)])
>>> C.append(0,0)
>>> C
Curve([(0.0, 0.0)])
Adding a monotonically increasing coordinate will work fine.
>>> C.append(0,1)
>>> C
Curve([(0.0, 0.0), (0.0, 1.0)])
Internal vertically or horizontally colinear are removed.
>>> C.append(0,2)
>>> C != [(0.0, 0.0), (0.0, 1.0), (0.0, 2.0)]
1
>>> C
Curve([(0.0, 0.0), (0.0, 2.0)])
On slanted lines, colinear points are not removed.
>>> C = Curve()
>>> C.append(0, 0)
>>> C.append(1, 1)
>>> C.append(2, 2)
>>> C
Curve([(0.0, 0.0), (1.0, 1.0), (2.0, 2.0)])
Integrate along the coordinates of a curve using the trapezoid rule.
Here are some examples:
>>> Curve( [(0,0), (1,1) ] ).area()
0.5
>>> Curve( [(0,0), (1,0), (1,1)] ).area()
0.0
>>> Curve( [(0,0), (0,1), (1,1)] ).area()
1.0
A static function which vertically averages a list of curves. For example:
>>> C1 = Curve([(0,0), (1,1)])
>>> C2 = Curve([(0,0), (0,1), (1,1)])
>>> Curve.average([C1,C2])
Curve([(0.0, 0.0), (0.0, 0.5), (1.0, 1.0)])
Read a curve from in a input file.
>>> from StringIO import StringIO
>>> file = StringIO('0 0 \n 0 1 \n 1 1')
>>> Curve.read_from_file(file)
Curve([(0.0, 0.0), (0.0, 1.0), (1.0, 1.0)])
A static function which vertically sumss a list of curves. For example:
>>> C1 = Curve([(0,0), (1,1)])
>>> C2 = Curve([(0,0), (0,1), (1,1)])
>>> Curve.sum([C1,C2])
Curve([(0.0, 0.0), (0.0, 1.0), (1.0, 2.0)])
Return a new curve with the x or y coordinates transformed
Return a new Curve object that has y values shrunk or expanded by a given scale factor.
An example:
>>> c = Curve( [(0,0), (1,1) ] )
>>> c.vertical_scale(2)
Curve([(0.0, 0.0), (1.0, 2.0)])
This class encodes the exponential transform computed as: f(x) = (1 - exp(-alpha * x)) / (1 - exp(-alpha))
Create a stepped AC curve.
This class encodes the logarithmic transform computed as: f(x) = log(1 + x * alpha)/log(1 + alpha)
This class encodes the exponential transform computed as: f(x) = x ^ (1/(1 + alpha))
Create a ROC curve.
>>> SD = ScoredData.from_ranks1([2,4],4)
>>> ROC(SD.sweep_threshold())
Curve([(0.0, 0.0), (0.5, 0.0), (0.5, 0.5), (1.0, 0.5), (1.0, 1.0)])
A convenience function which samples random curves and returns (1) a vertically averaged curve of all samples, (2) the average of all the areas of these samples, and (3) the unbiased standard deviation of the samples. For input, the function requires “sample” to be a callable object which returns randomly sampled Curve objects.
Preferred method to add score-label pairs to instance
An alternate constructor which takes as input the 0-indexed ranks of all the positive instances. Scores are fixed at the negative of the 0-indexed rank of each instance. Ties are not allowed in this constructor. N is the total number of negative and positive instances.
An alternate constructor which takes as input the 1-indexed ranks of all the positive instances. Scores are fixed at the negative of the 0-indexed rank of each instance. Ties are not allowed in this constructor. N is the total number of negative and positive instances.
Returns the number of scores associated with instances with different labels.
An alternate constructor which reads data from a file. The file format is white space delimited text file with the first column the score and the second column the label.
An alternate constructor which reads white space delimited ranks (0-indexed) from a file. The first integer should be N, the total number of positive and negative instances. The rest of the integers should be the ranks of the positive instances.
An alternate constructor which reads white space delimited ranks (1-indexed) from a file. The first integer should be N, the total number of positive and negative instances. The rest of the integers should be the ranks of the positive instances.
An iterater that yields TP, TN, FP, FN with a threshold at infinity and gradually sweeping down to negative infinity. Ties can be handeled in several ways:
Equivalent to the sweep_threshold method, but assumes all the positives are ranked at the top of the list.
Equivalent to the sweep_threshold method, but randomly shuffles the positives throughout the list.
Equivalent to the sweep_threshold method, but assumes all the positives are ranked at the bottom of the list.
Create a slanted AC curve.
>>> SD = ScoredData.from_ranks1([2,4],4)
>>> SlantedAC(SD.sweep_threshold())
Curve([(0.0, 0.0), (0.25, 0.0), (0.5, 0.5), (0.75, 0.5), (1.0, 1.0)])
The interface which all x-axis transforms should implement. The __call__ method should expect as map the input in the range [0,1] to the output domain [0,1] with f(0) = 0 and f(1) = 1.