Reliability assessment aims to predict reliabilities of individual predictions. Most of the implemented algorithms for regression are described in [Bosnic2008]; the algorithms for classification are described in [Pevec2011].
We can use reliability estimation with any prediction method. The following example:
- Constructs reliability estimators (implemented in this module),
- The Learner wrapper combines a a prediction method (learner), here a kNNLearner, with reliability estimators.
- Obtains prediction probabilities, which have an additional attribute, reliability_estimate, that contains a list of Orange.evaluation.reliability.Estimate.
import Orange
housing = Orange.data.Table("housing.tab")
knn = Orange.classification.knn.kNNLearner()
estimators = [Orange.evaluation.reliability.Mahalanobis(k=3),
Orange.evaluation.reliability.LocalCrossValidation(k = 10)]
reliability = Orange.evaluation.reliability.Learner(knn, estimators = estimators)
restimator = reliability(housing)
instance = housing[0]
value, probability = restimator(instance, result_type=Orange.classification.Classifier.GetBoth)
for estimate in probability.reliability_estimate:
print estimate.method_name, estimate.estimate
The next example prints reliability estimates for first 10 instances (with cross-validation):
import Orange
housing = Orange.data.Table("housing.tab")
knn = Orange.classification.knn.kNNLearner()
reliability = Orange.evaluation.reliability.Learner(knn)
results = Orange.evaluation.testing.cross_validation([reliability], housing)
for i, instance in enumerate(results.results[:10]):
print "Instance", i
for estimate in instance.probabilities[0].reliability_estimate:
print " ", estimate.method_name, estimate.estimate
Adds reliability estimation to any prediction method. This class can be used as any other Orange learner, but returns the classifier wrapped into an instance of Orange.evaluation.reliability.Classifier.
Parameters: |
|
---|---|
Return type: |
A reliability estimation wrapper for classifiers. The returned probabilities contain an additional attribute reliability_estimate, which is a list of Estimate (see __call__).
Classify and estimate reliability for a new instance. When result_type is set to Orange.classification.Classifier.GetBoth or Orange.classification.Classifier.GetProbabilities, an additional attribute reliability_estimate (a list of Estimate) is added to the distribution object.
Parameters: |
|
---|---|
Return type: | Orange.data.Value, Orange.statistics.Distribution or a tuple with both |
All measures except work with regression. Classification is
supported by BAGV, LCV, CNK and DENS,
.
Parameters: | e (list of floats) – Values of ![]() |
---|---|
Return type: | Orange.evaluation.reliability.SensitivityAnalysisClassifier |
The learning set is extended with that instancem, where the label is changed to
(
is the initial prediction,
a sensitivity parameter, and
and
the lower and upper bounds of labels on training data).
Results for multiple values of
are combined
into SAvar and SAbias. SAbias has a signed or absolute form.
Parameters: |
|
---|---|
Return type: | Orange.evaluation.reliability.BaggingVarianceClassifier |
For regression, BAGV is the variance of predictions:
, where
and
are
predictions of individual models.
For classification, BAGV is 1 minus the average Euclidean distance between class probability distributions predicted by the model, and distributions predicted by the individual bagged model; a greater value implies a better prediction.
This reliability measure can run out of memory if individual classifiers themselves
use a lot of memory; it needs times memory
for a single classifier.
Parameters: |
|
---|---|
Return type: | Orange.evaluation.reliability.LocalCrossValidationClassifier |
Leave-one-out validation is
performed on nearest neighbours to the given instance.
Reliability estimate for regression is then the distance
weighted absolute prediction error. For classification, it is 1 minus the average
distance between the predicted class probability distribution and the
(trivial) probability distributions of the nearest neighbour.
Parameters: |
|
---|---|
Return type: | Orange.evaluation.reliability.CNeighboursClassifier |
For regression, CNK is a difference between average label of its nearest neighbours and the prediction. CNK can be either signed or absolute. A greater value implies greater prediction error.
For classification, CNK is equal to 1 minus the average distance between predicted class distribution and (trivial) class distributions of the $k$ nearest neighbours from the learning set. A greater value implies better prediction.
Parameters: |
|
---|---|
Return type: | Orange.evaluation.reliability.BaggingVarianceCNeighboursClassifier |
BVCK is an average of Bagging variance and local modeling of prediction error.
Parameters: | k (int) – Number of nearest neighbours used in Mahalanobis estimate. |
---|---|
Return type: | Orange.evaluation.reliability.MahalanobisClassifier |
Mahalanobis distance reliability estimate is defined as
Mahalanobis distance
to the evaluated instance’s nearest neighbours.
Return type: | Orange.evaluation.reliability.MahalanobisToCenterClassifier |
---|
Mahalanobis distance to center reliability estimate is defined as a Mahalanobis distance between the predicted instance and the centroid of the data.
Parameters: |
|
---|---|
Return type: | Orange.evaluation.reliability.ParzenWindowDensityBasedClassifier |
Returns a value that estimates a density of problem space around the instance being predicted.
Selects the best reliability estimator for the given data with internal cross validation [Bosnic2010].
Parameters: |
|
---|
This methods develops a model that integrates reliability estimates from all available reliability scoring techniques (see [Wolpert1992] and [Dzeroski2004]). It performs internal cross-validation and therefore takes roughly the same time as ICV.
Parameters: |
|
---|
Return type: | Orange.evaluation.reliability.ReferenceExpectedErrorClassifier |
---|
Reference estimate for classification: , where
is the estimated probability of the predicted class [Pevec2011].
A greater estimate means a greater expected error.
These constants distinguish signed and absolute reliability estimation measures.
A dictionary that that maps reliability estimation method IDs (integers) to method names (strings).
Describes a reliability estimate.
Value of reliability.
Determines whether the method returned a signed or absolute result. Has a value of either SIGNED or ABSOLUTE.
An integer ID of the reliability estimation method used.
Name (string) of the reliability estimation method used.
Parameters: | res (Orange.evaluation.testing.ExperimentResults) – Evaluation results with reliability_estimate. |
---|
Pearson’s coefficients between the prediction error and reliability estimates with p-values.
Parameters: | res (Orange.evaluation.testing.ExperimentResults) – Evaluation results with reliability_estimate. |
---|
Pearson’s coefficients between prediction error and reliability estimates averaged over all folds.
Parameters: | res (Orange.evaluation.testing.ExperimentResults) – Evaluation results with reliability_estimate. |
---|
Spearman’s coefficients between the prediction error and reliability estimates with p-values.
The following script prints Pearson’s correlation coefficient (r) between reliability estimates and actual prediction errors, and a corresponding p-value, for default reliability estimation measures.
import Orange
prostate = Orange.data.Table("prostate.tab")
knn = Orange.classification.knn.kNNLearner()
reliability = Orange.evaluation.reliability.Learner(knn)
res = Orange.evaluation.testing.cross_validation([reliability], prostate)
reliability_res = Orange.evaluation.reliability.get_pearson_r(res)
print
print "Estimate r p"
for estimate in reliability_res:
print "%-21s%7.3f %7.3f" % (Orange.evaluation.reliability.METHOD_NAME[estimate[3]],
estimate[0], estimate[1])
Results:
Estimate r p
SAvar absolute -0.077 0.454
SAbias signed -0.165 0.105
SAbias absolute 0.095 0.352
LCV absolute 0.069 0.504
BVCK absolute 0.060 0.562
BAGV absolute 0.078 0.448
CNK signed 0.233 0.021
CNK absolute 0.058 0.574
Mahalanobis absolute 0.091 0.375
Mahalanobis to center 0.096 0.349
[Bosnic2007] | Bosnić, Z., Kononenko, I. (2007) Estimation of individual prediction reliability using local sensitivity analysis. Applied Intelligence 29(3), pp. 187-203. |
[Bosnic2008] | Bosnić, Z., Kononenko, I. (2008) Comparison of approaches for estimating reliability of individual regression predictions. Data & Knowledge Engineering 67(3), pp. 504-516. |
[Bosnic2010] | Bosnić, Z., Kononenko, I. (2010) Automatic selection of reliability estimates for individual regression predictions. The Knowledge Engineering Review 25(1), pp. 27-47. |
[Pevec2011] | (1, 2) Pevec, D., Štrumbelj, E., Kononenko, I. (2011) Evaluating Reliability of Single Classifications of Neural Networks. Adaptive and Natural Computing Algorithms, 2011, pp. 22-30. |
[Wolpert1992] | Wolpert, David H. (1992) Stacked generalization. Neural Networks, Vol. 5, 1992, pp. 241-259. |
[Dzeroski2004] | Dzeroski, S. and Zenko, B. (2004) Is combining classifiers with stacking better than selecting the best one? Machine Learning, Vol. 54, 2004, pp. 255-273. |