{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Classification and Regression Trees

# Classification and Regression Trees - Classication and...

This preview shows pages 1–3. Sign up to view the full content.

Classification and Regression Trees 36-350, Data Mining 6 November 2009 Contents 1 Prediction Trees 1 2 Regression Trees 4 2.1 Example: California Real Estate Again . . . . . . . . . . . . . . . 4 2.2 Regression Tree Fitting . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 Cross-Validation and Pruning in R . . . . . . . . . . . . . 13 2.3 Uncertainty in Regression Trees . . . . . . . . . . . . . . . . . . . 14 3 Classification Trees 18 3.1 Measuring Information . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Making Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Measuring Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.1 Misclassification Rate . . . . . . . . . . . . . . . . . . . . 20 3.3.2 Average Loss . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.3 Likelihood and Cross-Entropy . . . . . . . . . . . . . . . . 21 3.3.4 Neyman-Pearson Approach . . . . . . . . . . . . . . . . . 23 4 Further Reading 24 5 Exercises 24 Reading : Principles of Data Mining , sections 10.5 and 5.2 (in that order); Berk, chapter 3 Having built up increasingly complicated models for regression, I’ll now switch gears and introduce a class of nonlinear predictive model which at first seems too simple to possible work, namely prediction trees . These have two varieties, regression trees and classification trees . 1 Prediction Trees The basic idea is very simple. We want to predict a response or class Y from inputs X 1 , X 2 , . . . X p . We do this by growing a binary tree. At each internal 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
node in the tree, we apply a test to one of the inputs, say X i . Depending on the outcome of the test, we go to either the left or the right sub-branch of the tree. Eventually we come to a leaf node, where we make a prediction. This prediction aggregates or averages all the training data points which reach that leaf. Figure 1 should help clarify this. Why do this? Predictors like linear or polynomial regression are global models , where a single predictive formula is supposed to hold over the entire data space. When the data has lots of features which interact in complicated, nonlinear ways, assembling a single global model can be very difficult, and hope- lessly confusing when you do succeed. Some of the non-parametric smoothers try to fit models locally and then paste them together, but again they can be hard to interpret. (Additive models are at least pretty easy to grasp.) An alternative approach to nonlinear regression is to sub-divide, or parti- tion , the space into smaller regions, where the interactions are more manage- able. We then partition the sub-divisions again — this is recursive partition- ing , as in hierarchical clustering — until finally we get to chunks of the space which are so tame that we can fit simple models to them. The global model thus has two parts: one is just the recursive partition, the other is a simple model for each cell of the partition. Now look back at Figure 1 and the description which came before it. Predic- tion trees use the tree to represent the recursive partition. Each of the terminal nodes , or leaves , of the tree represents a cell of the partition, and has attached to it a simple model which applies in that cell only. A point x belongs to a leaf if x falls in the corresponding cell of the partition. To figure out which cell we are in, we start at the root node of the tree, and ask a sequence of ques- tions about the features. The interior nodes are labeled with questions, and the edges or branches between them labeled by the answers. Which question we ask
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}