This preview shows page 1. Sign up to view the full content.
Unformatted text preview: es have the same predicted value
No and the split will be discarded since the error of misclassi cations with
and without the split is identical. In the regression context the two predicted values
of .05 and .33 are di erent  the split has identi ed a nearly pure subgroup of
signi cant size.
This setup is known as odds regression, and may be a more sensible way to
evaluate a split when the emphasis of the model is on understanding explanation
rather than on prediction error per se. Extension of this rule to the multiple class
problem is appealing, but has not yet been implimented in rpart. 8 Poisson regression
8.1 De nition The Poisson splitting method attempts to extend rpart models to event rate data.
The model in this case is
= f x
where is an event rate and x is some set of predictors. As an example consider
hip fracture rates. For each county in the United States we can obtain
number of fractures in patients age 65 or greater from Medicare les
population of the county US census data
potential predictors such as
socioeconomic indicators
number of days below freezing
ethnic mix
physicians 1000 population
35 etc.
Such data would usually be approached by using Poisson regression; can we nd
a tree based analogue? In adding criteria for rates regression to this ensemble, the
guiding principle was the following: the between groups sumofsquares is not a very
robust measure, yet tree based regression works very well. So do the simplest thing
possible.
Let ci be the observed event count for observation i, ti be the observation time,
and xij ; j = 1; : : : ; p be the predictors. The y variable for the program will be a 2
column matrix.
Splitting criterion: The likelihood ratio test for two Poisson groups Dparent , Dleft son + Dright son Summary statistics: The observed event rate and the number of events.
P
^ = events = P ci
ti
total time
Error of a node: The within node deviance. D= X " ! ^
ci log ^ci , ci , ti
ti ^
Prediction error: The deviance contribution for a new observation, using of
the node as the predicted rate. 8.2 Improving the method There is a problem with the criterion just proposed, however: crossvalidation of a
model often produces an in nite value for the deviance. The simplest case where
this occurs is easy to understand. Assume that some terminal node of the tree has
20 subjects, but only 1 of the 20 has experienced any events. The crossvalidated
error deviance estimate for that node will have one subset  the one where the
^
subject with an event is left out  which has = 0. When we use the prediction
for the 10 of subjects who were set aside, the deviance contribution of the subject
with an event is
: : : + ci logci =0 + : : :
^
which is in nite since ci 0. The problem is that when = 0 the occurrence of an
event is in nitely improbable, and, using the deviance measure, the corresponding
model is then in nitely bad.
36 One might expect this phenomenon to be fairly rare, but unfortu...
View
Full
Document
This document was uploaded on 09/26/2013.
 Fall '13

Click to edit the document details