Abstract
Boosting takes on various forms with different programs using different loss functions, different base models, and different optimization schemes. The gbm package takes the approach described in [2] and [3]. Some of the terminology differs, mostly due to an effort to cast boosting terms into more standard sta-tistical terminology (e.g. deviance). In addition, the gbm package implements boosting for models commonly used in statistics but not commonly associated with boosting. The Cox proportional hazard model, for example, is an incred-ibly useful model and the boosting framework applies quite readily with only slight modification [5]. Also some algorithms implemented in the gbm package differ from the standard implementation. The AdaBoost algorithm [1] has a particular loss function and a particular optimization algorithm associated with it. The gbm implementation of AdaBoost adopts AdaBoost’s exponential loss function (its bound on misclassification rate) but uses Friedman’s gradient de-scent algorithm rather than the original one proposed. So the main purposes of this document is to spell out in detail what the gbm package implements. 1 Gradient boosting This section essentially presents the derivation of boosting described in [2]. The gbm package also adopts the stochastic gradient boosting strategy, a small but important tweak on the basic algorithm, described in [3]. 1.1 Friedman’s gradient boosting machine Friedman (2001) and the companion paper Friedman (2002) extended the work of Friedman, Hastie, and Tibshirani (2000) and laid the ground work for a new generation of boosting algorithms. Using the connection between boosting and optimization, this new work proposes the Gradient Boosting Machine. In any function estimation problem we wish to find a regression function, f̂(x), that minimizes the expectation of some loss function, Ψ(y, f), as shown in (4). f̂(x) = arg min f(x) Ey,xΨ(y, f(x)) 1 Initialize f̂(x) to be a constant, f̂(x) = arg minρ ∑N i=1 Ψ(yi, ρ). For t in 1,..., T do 1. Compute the negative gradient as the working response zi = − ∂
Keywords
Related Publications
<b>ada</b>: An<i>R</i>Package for Stochastic Boosting
Boosting is an iterative algorithm that combines simple classification rules with "mediocre" performance in terms of misclassification error rate to produce a highly accurate cl...
Greedy function approximation: A gradient boosting machine.
Function estimation/approximation is viewed from the perspective\nof numerical optimization in function space, rather than parameter space. A\nconnection is made between stagewi...
Experiments with a new boosting algorithm
In an earlier paper, we introduced a new &quot;boosting&quot; algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learni...
Recalibrating Fully Convolutional Networks With Spatial and Channel “Squeeze and Excitation” Blocks
In a wide range of semantic segmentation tasks, fully convolutional neural networks (F-CNNs) have been successfully leveraged to achieve the state-of-the-art performance. Archit...
Boosting with Maximum Adaptive Sampling
Classical Boosting algorithms, such as AdaBoost, build a strong classifier without concern about the computational cost. Some applications, in particular in computer vision, may...
Publication Info
- Year
- 2006
- Type
- article
- Citations
- 769
- Access
- Closed