Chp16 - Copy

Chp16 - Copy - 16 Ensemble Learning 16.1 Introduction The...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
16 Ensemble Learning 16.1 Introduction The idea of ensemble learning is to build a prediction model by combining the strengths of a collection of simpler base models. We have already seen a number of examples that fall into this category. Bagging in Section 8.7 and random forests in Chapter 15 are ensemble methods for classification, where a committee of trees each cast a vote for the predicted class. Boosting in Chapter 10 was initially proposed as a committee method as well, although unlike random forests, the committee of weak learners evolves over time, and the members cast a weighted vote. Stacking (Section 8.8) is a novel approach to combining the strengths of a number of fitted models. In fact one could characterize any dictionary method, such as regression splines, as an ensemble method, with the basis functions serving the role of weak learners. Bayesian methods for nonparametric regression can also be viewed as ensemble methods: a large number of candidate models are averaged with respect to the posterior distribution of their parameter settings (e.g. (Neal and Zhang, 2006)). Ensemble learning can be broken down into two tasks: developing a pop- ulation of base learners from the training data, and then combining them to form the composite predictor. In this chapter we discuss boosting tech- nology that goes a step further; it builds an ensemble model by conducting a regularized and supervised search in a high-dimensional space of weak learners. © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 605 DOI: 10.1007/b94608_16,
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
606 16. Ensemble Learning An early example of a learning ensemble is a method designed for multi- class classification using error-correcting output codes (Dietterich and Bakiri, 1995, ECOC). Consider the 10-class digit classification problem, and the coding matrix C given in Table 16.1. TABLE 16.1. Part of a 15 -bit error-correcting coding matrix C for the 10 -class digit classification problem. Each column defines a two-class classification prob- lem. Digit C 1 C 2 C 3 C 4 C 5 C 6 ··· C 15 0 110000 1 1 001111 0 2 100100 1 . . . . . . . . . . . . . . . . . . . . . . . . 8 110101 1 9 011100 0 Note that the ± th column of the coding matrix C ± defines a two-class variable that merges all the original classes into two groups. The method works as follows: 1. Learn a separate classifier for each of the L = 15 two class problems defined by the columns of the coding matrix. 2 . Atatestpo int x , let ˆ p ± ( x ) be the predicted probability of a one for the ± th response. 3. Define δ k ( x )= L ± =1 | C ˆ p ± ( x ) | , the discriminant function for the k th class, where C is the entry for row k and column ± in Table 16.1. Each row of C is a binary code for representing that class. The rows have more bits than is necessary, and the idea is that the redundant “error- correcting” bits allow for some inaccuracies, and can improve performance.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 20

Chp16 - Copy - 16 Ensemble Learning 16.1 Introduction The...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online