Abstract
ROC analysis is increasingly being recognised as an important tool for evaluation and comparison of classifiers when the operating characteristics (i.e. class distribution and cost parameters) are not known at training time. Usually, each classifier is characterised by its estimated true and false positive rates and is represented by a single point in the ROC diagram. In this paper, we show how a single decision tree can represent a set of classifiers by choosing different labellings of its leaves, or equivalently, an ordering on the leaves. In this setting, rather than estimating the accuracy of a single tree, it makes more sense to use the area under the ROC curve (AUC) as a quality metric. We also propose a novel splitting criterion which chooses the split with the highest local AUC. To the best of our knowledge, this is the first probabilistic splitting criterion that is not based on weighted average impurity. We present experiments suggesting that the AUC splitting criterion leads to trees with equal or better AUC value, without sacrificing accuracy if a single labelling is chosen.
Keywords
Affiliated Institutions
Related Publications
CLOUDS: a decision tree classifier for large datasets
Classification for very large datasets has many practical applications in data mining. Techniques such as discretization and dataset sampling can be used to scale up decision tr...
The random subspace method for constructing decision forests
Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is ...
Approximate Splitting for Ensembles of Trees using Histograms
Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular...
Comprehensible classification models
The vast majority of the literature evaluates the performance of classification models using only the criterion of predictive accuracy. This paper reviews the case for consideri...
Receiver operating characteristic curve: overview and practical use for clinicians
Using diagnostic testing to determine the presence or absence of a disease is essential in clinical practice. In many cases, test results are obtained as continuous values and r...
Publication Info
- Year
- 2002
- Type
- article
- Pages
- 139-146
- Citations
- 266
- Access
- Closed