Boosting

Boosting#

Similar to bagging, boosting grows a large number of decision trees. Unlike bagging, however, each tree depends strongly on the trees that have already been grown.

Parameters:

B: the number of trees
\(\lambda\): the shrinkage parameter (typically 0.01 or 0.001)
d: the number ofsplits (interaction depth)

Algorithm:

Initialize predicted outputs, e.g. \(\hat{f}(x) = 0\) and \(r_i = y_i\)
For each tree (out of B trees):
- fit a tree \(\hat{f}^b(x)\) with d splits to the training data (X, r), note that output is r not y
- update initialized predicted outputs: \(\hat{f}(x) = \hat{f}(x) + \lambda\hat{f}^b(x)\)
- update the residuals: \(r_i = r_i - \lambda\hat{f}^b(x_i)\)
Output boosted model
- \(\hat{f}(x) = \sum_{b=1}^{B}\lambda\hat{f}^b(x)\)