Chp16 - Copy

# Chp16 - Copy - 16 Ensemble Learning 16.1 Introduction The...

This preview shows pages 1–3. Sign up to view the full content.

16 Ensemble Learning 16.1 Introduction The idea of ensemble learning is to build a prediction model by combining the strengths of a collection of simpler base models. We have already seen a number of examples that fall into this category. Bagging in Section 8.7 and random forests in Chapter 15 are ensemble methods for classiﬁcation, where a committee of trees each cast a vote for the predicted class. Boosting in Chapter 10 was initially proposed as a committee method as well, although unlike random forests, the committee of weak learners evolves over time, and the members cast a weighted vote. Stacking (Section 8.8) is a novel approach to combining the strengths of a number of ﬁtted models. In fact one could characterize any dictionary method, such as regression splines, as an ensemble method, with the basis functions serving the role of weak learners. Bayesian methods for nonparametric regression can also be viewed as ensemble methods: a large number of candidate models are averaged with respect to the posterior distribution of their parameter settings (e.g. (Neal and Zhang, 2006)). Ensemble learning can be broken down into two tasks: developing a pop- ulation of base learners from the training data, and then combining them to form the composite predictor. In this chapter we discuss boosting tech- nology that goes a step further; it builds an ensemble model by conducting a regularized and supervised search in a high-dimensional space of weak learners. © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 605 DOI: 10.1007/b94608_16,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
606 16. Ensemble Learning An early example of a learning ensemble is a method designed for multi- class classiﬁcation using error-correcting output codes (Dietterich and Bakiri, 1995, ECOC). Consider the 10-class digit classiﬁcation problem, and the coding matrix C given in Table 16.1. TABLE 16.1. Part of a 15 -bit error-correcting coding matrix C for the 10 -class digit classiﬁcation problem. Each column deﬁnes a two-class classiﬁcation prob- lem. Digit C 1 C 2 C 3 C 4 C 5 C 6 ··· C 15 0 110000 1 1 001111 0 2 100100 1 . . . . . . . . . . . . . . . . . . . . . . . . 8 110101 1 9 011100 0 Note that the ± th column of the coding matrix C ± deﬁnes a two-class variable that merges all the original classes into two groups. The method works as follows: 1. Learn a separate classiﬁer for each of the L = 15 two class problems deﬁned by the columns of the coding matrix. 2 . Atatestpo int x , let ˆ p ± ( x ) be the predicted probability of a one for the ± th response. 3. Deﬁne δ k ( x )= L ± =1 | C ˆ p ± ( x ) | , the discriminant function for the k th class, where C is the entry for row k and column ± in Table 16.1. Each row of C is a binary code for representing that class. The rows have more bits than is necessary, and the idea is that the redundant “error- correcting” bits allow for some inaccuracies, and can improve performance.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 20

Chp16 - Copy - 16 Ensemble Learning 16.1 Introduction The...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online