Chp15 - Copy

# Chp15 - Copy - 15 Random Forests 15.1 Introduction Bagging...

This preview shows pages 1–3. Sign up to view the full content.

15 Random Forests 15.1 Introduction Bagging or bootstrap aggregation Section 8.7 is a technique for reducing the variance of an estimated prediction function. Bagging seems to work especially well for high-variance, low-bias procedures, such as trees. For regression, we simply fit the same regression tree many times to bootstrap- sampled versions of the training data, and average the result. For classifi- cation, a committee of trees each cast a vote for the predicted class. Boosting in Chapter 10 was initially proposed as a committee method as well, although unlike bagging, the committee of weak learners evolves over time, and the members cast a weighted vote. Boosting appears to dominate bagging on most problems, and became the preferred choice. Random forests (Breiman, 2001) is a substantial modification of bagging that builds a large collection of de-correlated trees, and then averages them. On many problems the performance of random forests is very similar to boosting, and they are simpler to train and tune. As a consequence, random forests are popular, and are implemented in a variety of packages. 15.2 Definition of Random Forests The essential idea in bagging (Section 8.7) is to average many noisy but approximately unbiased models, and hence reduce the variance. Trees are ideal candidates for bagging, since they can capture complex interaction © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 587 DOI: 10.1007/b94608_15,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
588 15. Random Forests Algorithm 15.1 Random Forest for Regression or Classification. 1. For b = 1 to B : (a) Draw a bootstrap sample Z of size N from the training data. (b) Grow a random-forest tree T b to the bootstrapped data, by re- cursively repeating the following steps for each terminal node of the tree, until the minimum node size n min is reached. i. Select m variables at random from the p variables. ii. Pick the best variable/split-point among the m . iii. Split the node into two daughter nodes. 2. Output the ensemble of trees { T b } B 1 . To make a prediction at a new point x : Regression: ˆ f B rf ( x ) = 1 B B b =1 T b ( x ). Classification: Let ˆ C b ( x ) be the class prediction of the b th random-forest tree. Then ˆ C B rf ( x ) = majority vote { ˆ C b ( x ) } B 1 . structures in the data, and if grown suﬃciently deep, have relatively low bias. Since trees are notoriously noisy, they benefit greatly from the averag- ing. Moreover, since each tree generated in bagging is identically distributed (i.d.), the expectation of an average of B such trees is the same as the ex- pectation of any one of them. This means the bias of bagged trees is the same as that of the individual trees, and the only hope of improvement is through variance reduction. This is in contrast to boosting, where the trees are grown in an adaptive way to remove bias, and hence are not i.d.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern