Evaluation: from precision, recall and F-measure to ROC, informedness,\n markedness and correlation

2020 arXiv (Cornell University) 1,514 citations

Abstract

Commonly used evaluation measures including Recall, Precision, F-Measure and\nRand Accuracy are biased and should not be used without clear understanding of\nthe biases, and corresponding identification of chance or base case levels of\nthe statistic. Using these measures a system that performs worse in the\nobjective sense of Informedness, can appear to perform better under any of\nthese commonly used measures. We discuss several concepts and measures that\nreflect the probability that prediction is informed versus chance. Informedness\nand introduce Markedness as a dual measure for the probability that prediction\nis marked versus chance. Finally we demonstrate elegant connections between the\nconcepts of Informedness, Markedness, Correlation and Significance as well as\ntheir intuitive relationships with Recall and Precision, and outline the\nextension from the dichotomous case to the general multi-class case.\n

Keywords

MarkednessMeasure (data warehouse)RecallStatisticCorrelationComputer scienceNatural language processingArtificial intelligenceStatisticsPsychologyMathematicsCognitive psychologyLinguisticsData mining

Related Publications

Publication Info

Year
2020
Type
preprint
Citations
1514
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1514
OpenAlex

Cite This

David Powers (2020). Evaluation: from precision, recall and F-measure to ROC, informedness,\n markedness and correlation. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.2010.16061

Identifiers

DOI
10.48550/arxiv.2010.16061

Data Quality

Data completeness: 77%