Classification and Regression Trees
36350, Data Mining
6 November 2009
Contents
1
Prediction Trees
1
2
Regression Trees
4
2.1
Example: California Real Estate Again
. . . . . . . . . . . . . . .
4
2.2
Regression Tree Fitting
. . . . . . . . . . . . . . . . . . . . . . .
7
2.2.1
CrossValidation and Pruning in R
. . . . . . . . . . . . .
13
2.3
Uncertainty in Regression Trees
. . . . . . . . . . . . . . . . . . .
14
3
Classification Trees
18
3.1
Measuring Information
. . . . . . . . . . . . . . . . . . . . . . . .
19
3.2
Making Predictions
. . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.3
Measuring Error
. . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.3.1
Misclassification Rate
. . . . . . . . . . . . . . . . . . . .
20
3.3.2
Average Loss
. . . . . . . . . . . . . . . . . . . . . . . . .
21
3.3.3
Likelihood and CrossEntropy
. . . . . . . . . . . . . . . .
21
3.3.4
NeymanPearson Approach
. . . . . . . . . . . . . . . . .
23
4
Further Reading
24
5
Exercises
24
Reading
:
Principles of Data Mining
, sections 10.5 and 5.2 (in that order);
Berk, chapter 3
Having built up increasingly complicated models for regression, I’ll now
switch gears and introduce a class of nonlinear predictive model which at first
seems too simple to possible work, namely
prediction trees
. These have two
varieties,
regression trees
and
classification trees
.
1
Prediction Trees
The basic idea is very simple. We want to predict a response or class
Y
from
inputs
X
1
, X
2
, . . . X
p
. We do this by growing a binary tree. At each internal
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
node in the tree, we apply a test to one of the inputs, say
X
i
. Depending on
the outcome of the test, we go to either the left or the right subbranch of the
tree.
Eventually we come to a leaf node, where we make a prediction.
This
prediction aggregates or averages all the training data points which reach that
leaf. Figure 1 should help clarify this.
Why do this?
Predictors like linear or polynomial regression are
global
models
, where a single predictive formula is supposed to hold over the entire
data space. When the data has lots of features which interact in complicated,
nonlinear ways, assembling a single global model can be very difficult, and hope
lessly confusing when you do succeed. Some of the nonparametric smoothers
try to fit models
locally
and then paste them together, but again they can be
hard to interpret. (Additive models are at least pretty easy to grasp.)
An alternative approach to nonlinear regression is to subdivide, or
parti
tion
, the space into smaller regions, where the interactions are more manage
able. We then partition the subdivisions again — this is
recursive partition
ing
, as in hierarchical clustering — until finally we get to chunks of the space
which are so tame that we can fit simple models to them. The global model thus
has two parts: one is just the recursive partition, the other is a simple model
for each cell of the partition.
Now look back at Figure 1 and the description which came before it. Predic
tion trees use the tree to represent the recursive partition. Each of the
terminal
nodes
, or
leaves
, of the tree represents a cell of the partition, and has attached
to it a simple model which applies in that cell only. A point
x
belongs
to a
leaf if
x
falls in the corresponding cell of the partition. To figure out which cell
we are in, we start at the
root node
of the tree, and ask a sequence of ques
tions about the features. The interior nodes are labeled with questions, and the
edges or branches between them labeled by the answers. Which question we ask
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 FRIEDMAN
 Linear Regression, Regression Analysis, Geographic coordinate system, leaf node

Click to edit the document details