Abstract
Accurate prediction and explanation are fundamental objectives of statistical analysis, yet they seldom coincide. Boosted trees are a statistical learning method that attains both of these objectives for regression and classification analyses. They can deal with many types of response variables (numeric, categorical, and censored), loss functions (Gaussian, binomial, Poisson, and robust), and predictors (numeric, categorical). Interactions between predictors can also be quantified and visualized. The theory underpinning boosted trees is presented, together with interpretive techniques. A new form of boosted trees, namely, "aggregated boosted trees" (ABT), is proposed and, in a simulation study, is shown to reduce prediction error relative to boosted trees. A regression data set is analyzed using ABT to illustrate the technique and to compare it with other methods, including boosted trees, bagged trees, random forests, and generalized additive models. A software package for ABT analysis using the R software environment is included in the Appendices together with worked examples.
Keywords
Affiliated Institutions
Related Publications
An empirical comparison of supervised learning algorithms
A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlo...
Classification and Regression by randomForest
Recently there has been a lot of interest in “ensemble learning” — methods that generate many classifiers and aggregate their results. Two well-known methods are boosting (see, ...
CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS
Classification and regression trees are ideally suited for the analysis of complex ecological data. For such data, we require flexible and robust analytical methods, which can d...
Categorical Data Analysis
Preface. 1. Introduction: Distributions and Inference for Categorical Data. 1.1 Categorical Response Data. 1.2 Distributions for Categorical Data. 1.3 Statistical Inference for ...
Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models
Summary In many supervised learning applications, understanding and visualizing the effects of the predictor variables on the predicted response is of paramount importance. A sh...
Publication Info
- Year
- 2007
- Type
- article
- Volume
- 88
- Issue
- 1
- Pages
- 243-251
- Citations
- 1257
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1890/0012-9658(2007)88[243:btfema]2.0.co;2