Abstract

Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, confirming the need for an unbiased variable selection. Moreover, it is shown that the prediction accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on glaucoma classification, node positive breast cancer survival and mammography experience are re-analyzed.

Keywords

OverfittingInterpretabilityFeature selectionComputer scienceInferenceRecursive partitioningCovariateMathematicsMachine learningArtificial intelligenceStatistics

Affiliated Institutions

Related Publications

Publication Info

Year
2006
Type
article
Volume
15
Issue
3
Pages
651-674
Citations
3906
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

3906
OpenAlex

Cite This

Torsten Hothorn, Kurt Hornik, Achim Zeileis (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics , 15 (3) , 651-674. https://doi.org/10.1198/106186006x133933

Identifiers

DOI
10.1198/106186006x133933