The Croc API

The Epydocs generated docs can be found here.

croc.BEDROC(scoreddata, alpha)

A convenience function which appropriately computes the BEDROC score (and associated curves and areas). Output is a dictionary with the relevant data appropriately labeled. For usage examples, please see the croc_bedroc script.

croc.CeilingAC(sweep)

Create a stepped AC curve.

class croc.Curve(coords=[])

A class that encodes, left to right, a monotonically increasing parametric curve.

append(x, y)

Append an x,y coordinate pair to this curve performing basic error checking to ensure resulting curve is monotonically increasing and that vertically and horizontolly colinear points are removed, and duplicate points are removed.

>>> C = Curve()
>>> C.append(0, 0) #initialize first coordinate to the origin

Adding a coordinate that decreases either the x or y component will throw an exception.

>>> C.append(-1,0)
Traceback (most recent call last):
    ...
AssertionError
>>> C.append(0,-1)
Traceback (most recent call last):
    ...
AssertionError

Adding a duplicate coordinate will do nothing.

>>> C
Curve([(0.0, 0.0)])
>>> C.append(0,0)
>>> C
Curve([(0.0, 0.0)])

Adding a monotonically increasing coordinate will work fine.

>>> C.append(0,1)
>>> C
Curve([(0.0, 0.0), (0.0, 1.0)])

Internal vertically or horizontally colinear are removed.

>>> C.append(0,2)
>>> C != [(0.0, 0.0), (0.0, 1.0), (0.0, 2.0)]
1
>>> C
Curve([(0.0, 0.0), (0.0, 2.0)])

On slanted lines, colinear points are not removed.

>>> C = Curve()
>>> C.append(0, 0)
>>> C.append(1, 1)
>>> C.append(2, 2)
>>> C
Curve([(0.0, 0.0), (1.0, 1.0), (2.0, 2.0)])
area()

Integrate along the coordinates of a curve using the trapezoid rule.

Here are some examples:

>>> Curve( [(0,0), (1,1) ] ).area()
0.5
>>> Curve( [(0,0), (1,0), (1,1)] ).area()
0.0
>>> Curve( [(0,0), (0,1), (1,1)] ).area()
1.0
static average(curves)

A static function which vertically averages a list of curves. For example:

>>> C1 = Curve([(0,0), (1,1)])
>>> C2 = Curve([(0,0), (0,1), (1,1)])
>>> Curve.average([C1,C2])
Curve([(0.0, 0.0), (0.0, 0.5), (1.0, 1.0)])
static read_from_file(file)

Read a curve from in a input file.

>>> from StringIO import StringIO
>>> file = StringIO('0 0 \n 0 1 \n 1 1')
>>> Curve.read_from_file(file)
Curve([(0.0, 0.0), (0.0, 1.0), (1.0, 1.0)])
static sum(curves)

A static function which vertically sumss a list of curves. For example:

>>> C1 = Curve([(0,0), (1,1)])
>>> C2 = Curve([(0,0), (0,1), (1,1)])
>>> Curve.sum([C1,C2])
Curve([(0.0, 0.0), (0.0, 1.0), (1.0, 2.0)])
transform(transform, axis='x')

Return a new curve with the x or y coordinates transformed

vertical_scale(scale)

Return a new Curve object that has y values shrunk or expanded by a given scale factor.

An example:

>>> c = Curve( [(0,0), (1,1) ] )
>>> c.vertical_scale(2)
Curve([(0.0, 0.0), (1.0, 2.0)])
write_to_file(file)
class croc.Exponential(alpha)

This class encodes the exponential transform computed as: f(x) = (1 - exp(-alpha * x)) / (1 - exp(-alpha))

croc.FloorAC(sweep)

Create a stepped AC curve.

class croc.Linear
class croc.Logarithm(alpha)

This class encodes the logarithmic transform computed as: f(x) = log(1 + x * alpha)/log(1 + alpha)

class croc.Power(alpha)

This class encodes the exponential transform computed as: f(x) = x ^ (1/(1 + alpha))

croc.ROC(sweep)

Create a ROC curve.

>>> SD = ScoredData.from_ranks1([2,4],4)
>>> ROC(SD.sweep_threshold()) 
Curve([(0.0, 0.0), (0.5, 0.0), (0.5, 0.5), (1.0, 0.5), (1.0, 1.0)])
croc.SampleCurves(sample, N=500)

A convenience function which samples random curves and returns (1) a vertically averaged curve of all samples, (2) the average of all the areas of these samples, and (3) the unbiased standard deviation of the samples. For input, the function requires “sample” to be a callable object which returns randomly sampled Curve objects.

class croc.ScoredData(scored_labels=[])
add(score, label)

Preferred method to add score-label pairs to instance

static from_ranks0(positive_ranks, N)

An alternate constructor which takes as input the 0-indexed ranks of all the positive instances. Scores are fixed at the negative of the 0-indexed rank of each instance. Ties are not allowed in this constructor. N is the total number of negative and positive instances.

static from_ranks1(positive_ranks, N)

An alternate constructor which takes as input the 1-indexed ranks of all the positive instances. Scores are fixed at the negative of the 0-indexed rank of each instance. Ties are not allowed in this constructor. N is the total number of negative and positive instances.

mixed_tie_count()

Returns the number of scores associated with instances with different labels.

static read_from_file(file)

An alternate constructor which reads data from a file. The file format is white space delimited text file with the first column the score and the second column the label.

static read_from_file_ranks0(file)

An alternate constructor which reads white space delimited ranks (0-indexed) from a file. The first integer should be N, the total number of positive and negative instances. The rest of the integers should be the ranks of the positive instances.

static read_from_file_ranks1(file)

An alternate constructor which reads white space delimited ranks (1-indexed) from a file. The first integer should be N, the total number of positive and negative instances. The rest of the integers should be the ranks of the positive instances.

sweep_threshold(tie_mode='smooth')

An iterater that yields TP, TN, FP, FN with a threshold at infinity and gradually sweeping down to negative infinity. Ties can be handeled in several ways:

  • smooth (preferred) - construct a smooth slanted line which interpolates the TP, TN, FP, and FN appropriately
  • ignore - output the instances in the order they were presented to the ScoredData instance.
  • sample - randomly shuffle the ties.
sweep_threshold_best()

Equivalent to the sweep_threshold method, but assumes all the positives are ranked at the top of the list.

sweep_threshold_random()

Equivalent to the sweep_threshold method, but randomly shuffles the positives throughout the list.

sweep_threshold_worst()

Equivalent to the sweep_threshold method, but assumes all the positives are ranked at the bottom of the list.

croc.SlantedAC(sweep)

Create a slanted AC curve.

>>> SD = ScoredData.from_ranks1([2,4],4) 
>>> SlantedAC(SD.sweep_threshold()) 
Curve([(0.0, 0.0), (0.25, 0.0), (0.5, 0.5), (0.75, 0.5), (1.0, 1.0)])
class croc.Transform(alpha)

The interface which all x-axis transforms should implement. The __call__ method should expect as map the input in the range [0,1] to the output domain [0,1] with f(0) = 0 and f(1) = 1.

croc.main()

Previous topic

R Interface

This Page