Chp10 - Copy

Chp10 - Copy - 10 Boosting and Additive Trees 10.1 Boosting...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
10 Boosting and Additive Trees 10.1 Boosting Methods Boosting is one of the most powerful learning ideas introduced in the last twenty years. It was originally designed for classification problems, but as will be seen in this chapter, it can profitably be extended to regression as well. The motivation for boosting was a procedure that combines the outputs of many “weak” classifiers to produce a powerful “committee.” From this perspective boosting bears a resemblance to bagging and other committee-based approaches (Section 8.8). However we shall see that the connection is at best superficial and that boosting is fundamentally differ- ent. We begin by describing the most popular boosting algorithm due to Freund and Schapire (1997) called “AdaBoost.M1.” Consider a two-class problem, with the output variable coded as Y ∈{− 1 , 1 } . Given a vector of predictor variables X ,ac lass ifier G ( X ) produces a prediction taking one of the two values {− 1 , 1 } . The error rate on the training sample is err = 1 N N ± i =1 I ( y i ± = G ( x i )) , and the expected error rate on future predictions is E XY I ( Y ± = G ( X )). A weak classifier is one whose error rate is only slightly better than random guessing. The purpose of boosting is to sequentially apply the weak classification algorithm to repeatedly modified versions of the data, thereby producing a sequence of weak classifiers G m ( x ) ,m =1 , 2 ,...,M . © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 337 DOI: 10.1007/b94608_10,
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
338 10. Boosting and Additive Trees Training Sample Weighted Sample Weighted Sample Weighted Sample G ( x ) = sign ± M m =1 α m G m ( x ) ² G M ( x ) G 3 ( x ) G 2 ( x ) G 1 ( x ) Final Classifier FIGURE 10.1. Schematic of AdaBoost. Classifiers are trained on weighted ver- sions of the dataset, and then combined to produce a final prediction. The predictions from all of them are then combined through a weighted majority vote to produce the final prediction: G ( x ) = sign ³ M ´ m =1 α m G m ( x ) µ . (10.1) Here α 1 2 ,...,α M are computed by the boosting algorithm, and weight the contribution of each respective G m ( x ). Their effect is to give higher influence to the more accurate classifiers in the sequence. Figure 10.1 shows a schematic of the AdaBoost procedure. The data modifications at each boosting step consist of applying weights w 1 ,w 2 ,...,w N to each of the training observations ( x i ,y i ) ,i =1 , 2 ,...,N . Initially all of the weights are set to w i /N , so that the first step simply trains the classifier on the data in the usual manner. For each successive iteration m =2 , 3 ,...,M the observation weights are individually modi- fied and the classification algorithm is reapplied to the weighted observa- tions. At step m , those observations that were misclassified by the classifier G m 1 ( x ) induced at the previous step have their weights increased, whereas the weights are decreased for those that were classified correctly. Thus as iterations proceed, observations that are difficult to classify correctly re- ceive ever-increasing influence. Each successive classifier is thereby forced
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/14/2010 for the course STAT 132 taught by Professor Haulk during the Spring '10 term at UBC.

Page1 / 51

Chp10 - Copy - 10 Boosting and Additive Trees 10.1 Boosting...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online