Chp15 - Copy - 15 Random Forests 15.1 Introduction Bagging...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
15 Random Forests 15.1 Introduction Bagging or bootstrap aggregation Section 8.7 is a technique for reducing the variance of an estimated prediction function. Bagging seems to work especially well for high-variance, low-bias procedures, such as trees. For regression, we simply fit the same regression tree many times to bootstrap- sampled versions of the training data, and average the result. For classifi- cation, a committee of trees each cast a vote for the predicted class. Boosting in Chapter 10 was initially proposed as a committee method as well, although unlike bagging, the committee of weak learners evolves over time, and the members cast a weighted vote. Boosting appears to dominate bagging on most problems, and became the preferred choice. Random forests (Breiman, 2001) is a substantial modification of bagging that builds a large collection of de-correlated trees, and then averages them. On many problems the performance of random forests is very similar to boosting, and they are simpler to train and tune. As a consequence, random forests are popular, and are implemented in a variety of packages. 15.2 Definition of Random Forests The essential idea in bagging (Section 8.7) is to average many noisy but approximately unbiased models, and hence reduce the variance. Trees are ideal candidates for bagging, since they can capture complex interaction © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 587 DOI: 10.1007/b94608_15,
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
588 15. Random Forests Algorithm 15.1 Random Forest for Regression or Classification. 1. For b = 1 to B : (a) Draw a bootstrap sample Z of size N from the training data. (b) Grow a random-forest tree T b to the bootstrapped data, by re- cursively repeating the following steps for each terminal node of the tree, until the minimum node size n min is reached. i. Select m variables at random from the p variables. ii. Pick the best variable/split-point among the m . iii. Split the node into two daughter nodes. 2. Output the ensemble of trees { T b } B 1 . To make a prediction at a new point x : Regression: ˆ f B rf ( x ) = 1 B B b =1 T b ( x ). Classification: Let ˆ C b ( x ) be the class prediction of the b th random-forest tree. Then ˆ C B rf ( x ) = majority vote { ˆ C b ( x ) } B 1 . structures in the data, and if grown sufficiently deep, have relatively low bias. Since trees are notoriously noisy, they benefit greatly from the averag- ing. Moreover, since each tree generated in bagging is identically distributed (i.d.), the expectation of an average of B such trees is the same as the ex- pectation of any one of them. This means the bias of bagged trees is the same as that of the individual trees, and the only hope of improvement is through variance reduction. This is in contrast to boosting, where the trees are grown in an adaptive way to remove bias, and hence are not i.d.
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern