Classification and Regression Trees

Classification and Regression Trees - Classification and...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Classification and Regression Trees 36-350, Data Mining 6 November 2009 Contents 1 Prediction Trees 1 2 Regression Trees 4 2.1 Example: California Real Estate Again . . . . . . . . . . . . . . . 4 2.2 Regression Tree Fitting . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Cross-Validation and Pruning in R . . . . . . . . . . . . . 13 2.3 Uncertainty in Regression Trees . . . . . . . . . . . . . . . . . . . 14 3 Classification Trees 18 3.1 Measuring Information . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Making Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Measuring Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.1 Misclassification Rate . . . . . . . . . . . . . . . . . . . . 20 3.3.2 Average Loss . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.3 Likelihood and Cross-Entropy . . . . . . . . . . . . . . . . 21 3.3.4 Neyman-Pearson Approach . . . . . . . . . . . . . . . . . 23 4 Further Reading 24 5 Exercises 24 Reading : Principles of Data Mining , sections 10.5 and 5.2 (in that order); Berk, chapter 3 Having built up increasingly complicated models for regression, Ill now switch gears and introduce a class of nonlinear predictive model which at first seems too simple to possible work, namely prediction trees . These have two varieties, regression trees and classification trees . 1 Prediction Trees The basic idea is very simple. We want to predict a response or class Y from inputs X 1 ,X 2 ,...X p . We do this by growing a binary tree. At each internal 1 node in the tree, we apply a test to one of the inputs, say X i . Depending on the outcome of the test, we go to either the left or the right sub-branch of the tree. Eventually we come to a leaf node, where we make a prediction. This prediction aggregates or averages all the training data points which reach that leaf. Figure 1 should help clarify this. Why do this? Predictors like linear or polynomial regression are global models , where a single predictive formula is supposed to hold over the entire data space. When the data has lots of features which interact in complicated, nonlinear ways, assembling a single global model can be very difficult, and hope- lessly confusing when you do succeed. Some of the non-parametric smoothers try to fit models locally and then paste them together, but again they can be hard to interpret. (Additive models are at least pretty easy to grasp.) An alternative approach to nonlinear regression is to sub-divide, or parti- tion , the space into smaller regions, where the interactions are more manage- able. We then partition the sub-divisions again this is recursive partition- ing , as in hierarchical clustering until finally we get to chunks of the space which are so tame that we can fit simple models to them. The global model thus has two parts: one is just the recursive partition, the other is a simple model for each cell of the partition....
View Full Document

This note was uploaded on 06/10/2011 for the course STATS 315B taught by Professor Friedman during the Spring '08 term at Stanford.

Page1 / 25

Classification and Regression Trees - Classification and...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online