Boosting#
Similar to bagging, boosting grows a large number of decision trees. Unlike bagging, however, each tree depends strongly on the trees that have already been grown.
Parameters:
B: the number of trees
\(\lambda\): the shrinkage parameter (typically 0.01 or 0.001)
d: the number ofsplits (interaction depth)
Algorithm:
Initialize predicted outputs, e.g. \(\hat{f}(x) = 0\) and \(r_i = y_i\)
For each tree (out of B trees):
fit a tree \(\hat{f}^b(x)\) with d splits to the training data (X, r), note that output is r not y
update initialized predicted outputs: \(\hat{f}(x) = \hat{f}(x) + \lambda\hat{f}^b(x)\)
update the residuals: \(r_i = r_i - \lambda\hat{f}^b(x_i)\)
Output boosted model
\(\hat{f}(x) = \sum_{b=1}^{B}\lambda\hat{f}^b(x)\)