Returns the maximal absolute error between y and yhat.
Parameters: |
|
---|
Lower is better.
>>> absolute_error([0,1,2,3], [0,0,1,1])
2.0
Returns the accuracy. Higher is better.
Parameters: |
|
---|
Computes the area under the specified curve.
Parameters: | curve – a curve, specified as a list of (x, y) tuples |
---|
See also
optunity.score_functions.compute_curve()
Returns the Brier score between y and yhat.
Parameters: |
|
---|---|
Returns: |
\[\frac{1}{n} \sum_{i=1}^n \big[(\hat{y}-y)^2\big]\]
|
yhat must be a vector of probabilities, e.g. elements in [0, 1]
Lower is better.
Note
This loss function should only be used for probabilistic models.
Computes a curve based on contingency tables at different decision values.
Parameters: |
|
---|---|
Returns: | the resulting curve, as a list of (x, y)-tuples |
Computes a contingency table for given predictions.
Parameters: |
|
---|---|
Returns: | TP, FP, TN, FN |
>>> ys = [True, True, True, True, True, False]
>>> yhats = [True, True, False, False, False, True]
>>> tab = contingency_table(ys, yhats, 1)
>>> print(tab)
(2, 1, 0, 3)
Computes contingency tables for every unique decision value.
Parameters: |
|
---|---|
Returns: | a list of contingency tables (TP, FP, TN, FN) and the corresponding thresholds. |
Contingency tables are built based on decision \(decision\_value \geq threshold\).
The first contingency table corresponds with a (potentially unseen) threshold that yields all negatives.
>>> y = [0, 0, 0, 0, 1, 1, 1, 1]
>>> d = [2, 2, 1, 1, 1, 2, 3, 3]
>>> tables, thresholds = contingency_tables(y, d, 1)
>>> print(tables)
[(0, 0, 4, 4), (2, 0, 4, 2), (3, 2, 2, 1), (4, 4, 0, 0)]
>>> print(thresholds)
[None, 3, 2, 1]
Returns the error rate (lower is better).
Parameters: |
|
---|
>>> error_rate([0,0,1,1], [0,0,0,1])
0.25
Returns the \(F_\beta\)-score.
Parameters: |
|
---|---|
Returns: |
\[(1 + \beta^2)\frac{cdot precision\cdot recall}{(\beta^2 * precision)+recall}\]
|
Returns the log loss between labels and predictions.
Parameters: |
|
---|---|
Returns: |
\[-\frac{1}{n}\sum_{i=1}^n\big[y \times \log \hat{y}+(1-y) \times \log (1-\hat{y})\big]\]
|
y must be a binary vector, e.g. elements in {True, False} yhat must be a vector of probabilities, e.g. elements in [0, 1]
Lower is better.
Note
This loss function should only be used for probabilistic models.
Returns the mean squared error between y and yhat.
Parameters: |
|
---|---|
Returns: |
\[\frac{1}{n} \sum_{i=1}^n \big[(\hat{y}-y)^2\big]\]
|
Lower is better.
>>> mse([0, 0], [2, 3])
6.5
Returns the negative predictive value (higher is better).
Parameters: |
|
---|---|
Returns: | number of true negative predictions / number of negative predictions |
Computes the area under the precision-recall curve (higher is better).
Parameters: |
|
---|
>>> pr_auc([0, 0, 1, 1], [0, 0, 1, 1], 1)
1.0
>>> round(pr_auc([0,0,1,1], [0,1,1,2], 1), 2)
0.92
Note
Precision is undefined at recall = 0. In this case, we set precision equal to the precision that was obtained at the lowest non-zero recall.
Returns the precision (higher is better).
Parameters: |
|
---|---|
Returns: | number of true positive predictions / number of positive predictions |
Returns a score used for PU learning as introduced in [LEE2003].
Parameters: |
|
---|---|
Returns: |
\[\frac{P(\hat{y}=1 | y=1)^2}{P(\hat{y}=1)}\]
|
y and yhat must be boolean vectors.
Higher is better.
[LEE2003] | Wee Sun Lee and Bing Liu. Learning with positive and unlabeled examples using weighted logistic regression. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), 2003. |
Returns the R squared statistic, also known as coefficient of determination (higher is better).
Parameters: |
|
---|---|
Returns: |
\[R^2 = 1-\frac{SS_{res}}{SS_{tot}} = 1-\frac{\sum_i (y_i - yhat_i)^2}{\sum_i (y_i - mean(y))^2}\]
|
Returns the recall (higher is better).
Parameters: |
|
---|---|
Returns: | number of true positive predictions / number of true positives |
Computes the area under the receiver operating characteristic curve (higher is better).
Parameters: |
|
---|
>>> roc_auc([0, 0, 1, 1], [0, 0, 1, 1], 1)
1.0
>>> roc_auc([0,0,1,1], [0,1,1,2], 1)
0.875