Abstract

1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions. 2. This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion. 3. Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data. They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors. Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance. Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods. 4. The unique features of BRT raise a number of practical issues in model fitting. We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel (Anguilla australis Richardson), a native freshwater fish of New Zealand. We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data. We provide code and a tutorial to enable the wider use of BRT by ecologists.

Keywords

Boosting (machine learning)Computer scienceRegressionDecision treeOutlierMachine learningRegression analysisTree (set theory)Statistical modelArtificial intelligenceMultivariate adaptive regression splinesLinear regressionPredictive modellingSimple linear regressionData miningStatisticsMathematicsPolynomial regression

Affiliated Institutions

Related Publications

Modelling Binary Data

INTRODUCTION Some Examples The Scope of this Book Use of Statistical Software STATISTICAL INFERENCE FOR BINARY DATA The Binomial Distribution Inference about the Success Probabi...

2002 1450 citations

Publication Info

Year
2008
Type
article
Volume
77
Issue
4
Pages
802-813
Citations
6183
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

6183
OpenAlex

Cite This

Jane Elith, John R. Leathwick, Trevor Hastie (2008). A working guide to boosted regression trees. Journal of Animal Ecology , 77 (4) , 802-813. https://doi.org/10.1111/j.1365-2656.2008.01390.x

Identifiers

DOI
10.1111/j.1365-2656.2008.01390.x