Abstract

ROC analysis is increasingly being recognised as an important tool for evaluation and comparison of classifiers when the operating characteristics (i.e. class distribution and cost parameters) are not known at training time. Usually, each classifier is characterised by its estimated true and false positive rates and is represented by a single point in the ROC diagram. In this paper, we show how a single decision tree can represent a set of classifiers by choosing different labellings of its leaves, or equivalently, an ordering on the leaves. In this setting, rather than estimating the accuracy of a single tree, it makes more sense to use the area under the ROC curve (AUC) as a quality metric. We also propose a novel splitting criterion which chooses the split with the highest local AUC. To the best of our knowledge, this is the first probabilistic splitting criterion that is not based on weighted average impurity. We present experiments suggesting that the AUC splitting criterion leads to trees with equal or better AUC value, without sacrificing accuracy if a single labelling is chosen.

Keywords

Receiver operating characteristicDecision treeClassifier (UML)Artificial intelligenceMathematicsProbabilistic logicMetric (unit)Pattern recognition (psychology)Cut-pointDecision tree learningComputer sciencePerformance metricStatisticsMachine learningData miningAlgorithm

Affiliated Institutions

Related Publications

Comprehensible classification models

The vast majority of the literature evaluates the performance of classification models using only the criterion of predictive accuracy. This paper reviews the case for consideri...

2014 ACM SIGKDD Explorations Newsletter 548 citations

Publication Info

Year
2002
Type
article
Pages
139-146
Citations
266
Access
Closed

External Links

Citation Metrics

266
OpenAlex

Cite This

Cèsar Ferri, Peter Flach, José Hernández‐Orallo (2002). Learning Decision Trees Using the Area Under the ROC Curve. , 139-146.