Chp10 - Copy

# Chp10 - Copy - 10 Boosting and Additive Trees 10.1 Boosting...

This preview shows pages 1–4. Sign up to view the full content.

10 Boosting and Additive Trees 10.1 Boosting Methods Boosting is one of the most powerful learning ideas introduced in the last twenty years. It was originally designed for classification problems, but as will be seen in this chapter, it can profitably be extended to regression as well. The motivation for boosting was a procedure that combines the outputs of many “weak” classifiers to produce a powerful “committee.” From this perspective boosting bears a resemblance to bagging and other committee-based approaches (Section 8.8). However we shall see that the connection is at best superficial and that boosting is fundamentally differ- ent. We begin by describing the most popular boosting algorithm due to Freund and Schapire (1997) called “AdaBoost.M1.” Consider a two-class problem, with the output variable coded as Y ∈ {− 1 , 1 } . Given a vector of predictor variables X , a classifier G ( X ) produces a prediction taking one of the two values {− 1 , 1 } . The error rate on the training sample is err = 1 N N i =1 I ( y i = G ( x i )) , and the expected error rate on future predictions is E XY I ( Y = G ( X )). A weak classifier is one whose error rate is only slightly better than random guessing. The purpose of boosting is to sequentially apply the weak classification algorithm to repeatedly modified versions of the data, thereby producing a sequence of weak classifiers G m ( x ) , m = 1 , 2 , . . . , M . © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 337 DOI: 10.1007/b94608_10,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
338 10. Boosting and Additive Trees Training Sample Weighted Sample Weighted Sample Weighted Sample G ( x ) = sign M m =1 α m G m ( x ) G M ( x ) G 3 ( x ) G 2 ( x ) G 1 ( x ) Final Classifier FIGURE 10.1. Schematic of AdaBoost. Classifiers are trained on weighted ver- sions of the dataset, and then combined to produce a final prediction. The predictions from all of them are then combined through a weighted majority vote to produce the final prediction: G ( x ) = sign M m =1 α m G m ( x ) . (10.1) Here α 1 , α 2 , . . . , α M are computed by the boosting algorithm, and weight the contribution of each respective G m ( x ). Their effect is to give higher inﬂuence to the more accurate classifiers in the sequence. Figure 10.1 shows a schematic of the AdaBoost procedure. The data modifications at each boosting step consist of applying weights w 1 , w 2 , . . . , w N to each of the training observations ( x i , y i ) , i = 1 , 2 , . . . , N . Initially all of the weights are set to w i = 1 /N , so that the first step simply trains the classifier on the data in the usual manner. For each successive iteration m = 2 , 3 , . . . , M the observation weights are individually modi- fied and the classification algorithm is reapplied to the weighted observa- tions. At step m , those observations that were misclassified by the classifier G m 1 ( x ) induced at the previous step have their weights increased, whereas the weights are decreased for those that were classified correctly. Thus as iterations proceed, observations that are diﬃcult to classify correctly re- ceive ever-increasing inﬂuence. Each successive classifier is thereby forced
10.1 Boosting Methods

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.