This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Classification and Regression Trees 36350, Data Mining 6 November 2009 Contents 1 Prediction Trees 1 2 Regression Trees 4 2.1 Example: California Real Estate Again . . . . . . . . . . . . . . . 4 2.2 Regression Tree Fitting . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 CrossValidation and Pruning in R . . . . . . . . . . . . . 13 2.3 Uncertainty in Regression Trees . . . . . . . . . . . . . . . . . . . 14 3 Classification Trees 18 3.1 Measuring Information . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Making Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Measuring Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.1 Misclassification Rate . . . . . . . . . . . . . . . . . . . . 20 3.3.2 Average Loss . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.3 Likelihood and CrossEntropy . . . . . . . . . . . . . . . . 21 3.3.4 NeymanPearson Approach . . . . . . . . . . . . . . . . . 23 4 Further Reading 24 5 Exercises 24 Reading : Principles of Data Mining , sections 10.5 and 5.2 (in that order); Berk, chapter 3 Having built up increasingly complicated models for regression, Ill now switch gears and introduce a class of nonlinear predictive model which at first seems too simple to possible work, namely prediction trees . These have two varieties, regression trees and classification trees . 1 Prediction Trees The basic idea is very simple. We want to predict a response or class Y from inputs X 1 ,X 2 ,...X p . We do this by growing a binary tree. At each internal 1 node in the tree, we apply a test to one of the inputs, say X i . Depending on the outcome of the test, we go to either the left or the right subbranch of the tree. Eventually we come to a leaf node, where we make a prediction. This prediction aggregates or averages all the training data points which reach that leaf. Figure 1 should help clarify this. Why do this? Predictors like linear or polynomial regression are global models , where a single predictive formula is supposed to hold over the entire data space. When the data has lots of features which interact in complicated, nonlinear ways, assembling a single global model can be very difficult, and hope lessly confusing when you do succeed. Some of the nonparametric smoothers try to fit models locally and then paste them together, but again they can be hard to interpret. (Additive models are at least pretty easy to grasp.) An alternative approach to nonlinear regression is to subdivide, or parti tion , the space into smaller regions, where the interactions are more manage able. We then partition the subdivisions again this is recursive partition ing , as in hierarchical clustering until finally we get to chunks of the space which are so tame that we can fit simple models to them. The global model thus has two parts: one is just the recursive partition, the other is a simple model for each cell of the partition....
View
Full
Document
This note was uploaded on 06/10/2011 for the course STATS 315B taught by Professor Friedman during the Spring '08 term at Stanford.
 Spring '08
 FRIEDMAN

Click to edit the document details