Evaluations

MSAF includes the standard evaluation metrics used in MIREX. Here we describe how to evaluate the algorithms’ results and discuss each of these metrics, classified based on the subtask they aim to assess.

These metrics are computed using the external and fantastic framework mir_eval.

How To Evaluate Results

The module eval.py contains the following process function that can be called once the desired algorithms have been run on a single file or dataset:

Evaluates the estimated results of the Segmentation dataset against the ground truth (human annotated data).

process(in_path[, boundaries_id, labels_id, ...]) Main process to evaluate algorithms’ results.

The return value of this function is a dictionary (or a list of dictionaries, in case of collection mode) containing all of the available metrics for the evaluated subtask(s). The keys to this dictionary, with a description of each metric are found below.

Boundary Metrics

Boundary Metric Description
D Information Gain
DevE2R Median Deviation from Estimation to Reference
DevR2E Median Deviation from Reference to Estimation
DevtE2R Median Deviation from Estimation to Reference without first and last boundaries (trimmed)
DevtR2E Median Deviation from Reference to Estimation without first and last boundaries (trimmed)
HitRate_0.5F Hit Rate F-measure using 0.5 seconds window
HitRate_0.5P Hit Rate Precision using 0.5 seconds window
HitRate_0.5R Hit Rate Recall using 0.5 seconds window
HitRate_3F Hit Rate F-measure using 3 seconds window
HitRate_3P Hit Rate Precision using 3 seconds window
HitRate_3R Hit Rate Recall using 3 seconds window
HitRate_t0.5F Hit Rate F-measure using 0.5 seconds window without first and last boundaries (trimmed)
HitRate_t0.5P Hit Rate Precision using 0.5 seconds window without first and last boundaries (trimmed)
HitRate_t0.5R Hit Rate Recall using 0.5 seconds window without first and last boundaries (trimmed)
HitRate_t3F Hit Rate F-measure using 3 seconds window without first and last boundaries (trimmed)
HitRate_t3P Hit Rate Precision using 3 seconds window without first and last boundaries (trimmed)
HitRate_t3R Hit Rate Recall using 3 seconds window without first and last boundaries (trimmed)
t_measure10 T-Measures F-measure at 10 seconds window
t_precision10 T-Measures Precision at 10 seconds window
t_recall10 T-Measures Recall at 10 seconds window
t_measure15 T-Measures F-measure at 15 seconds window
t_precision15 T-Measures Precision at 15 seconds window
t_recall15 T-Measures Recall at 15 seconds window

Label Metrics

Label Metric Description
PWF Pairwise Frame Clustering F-measure
PWP Pairwise Frame Clustering Precision
PWR Pairwise Frame Clustering Recall
Sf Normalized Entropy Scores F-measure
So Normalized Entropy Scores Precision
Su Normalized Entropy Scores Recall