{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Rpart_TechReport61

# Case the gini splitting rule reduces to 2p1 p which is

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: es have the same predicted value No and the split will be discarded since the error  of misclassi cations with and without the split is identical. In the regression context the two predicted values of .05 and .33 are di erent | the split has identi ed a nearly pure subgroup of signi cant size. This setup is known as odds regression, and may be a more sensible way to evaluate a split when the emphasis of the model is on understanding explanation rather than on prediction error per se. Extension of this rule to the multiple class problem is appealing, but has not yet been implimented in rpart. 8 Poisson regression 8.1 De nition The Poisson splitting method attempts to extend rpart models to event rate data. The model in this case is  = f x where  is an event rate and x is some set of predictors. As an example consider hip fracture rates. For each county in the United States we can obtain number of fractures in patients age 65 or greater from Medicare les population of the county US census data potential predictors such as socio-economic indicators number of days below freezing ethnic mix physicians 1000 population 35 etc. Such data would usually be approached by using Poisson regression; can we nd a tree based analogue? In adding criteria for rates regression to this ensemble, the guiding principle was the following: the between groups sum-of-squares is not a very robust measure, yet tree based regression works very well. So do the simplest thing possible. Let ci be the observed event count for observation i, ti be the observation time, and xij ; j = 1; : : : ; p be the predictors. The y variable for the program will be a 2 column matrix. Splitting criterion: The likelihood ratio test for two Poisson groups Dparent , Dleft son + Dright son Summary statistics: The observed event rate and the number of events. P ^ =  events = P ci  ti total time Error of a node: The within node deviance. D= X "  ! ^ ci log ^ci , ci , ti ti  ^ Prediction error: The deviance contribution for a new observation, using  of the node as the predicted rate. 8.2 Improving the method There is a problem with the criterion just proposed, however: cross-validation of a model often produces an in nite value for the deviance. The simplest case where this occurs is easy to understand. Assume that some terminal node of the tree has 20 subjects, but only 1 of the 20 has experienced any events. The cross-validated error deviance estimate for that node will have one subset | the one where the ^ subject with an event is left out | which has  = 0. When we use the prediction for the 10 of subjects who were set aside, the deviance contribution of the subject with an event is : : : + ci logci =0 + : : : ^ which is in nite since ci 0. The problem is that when  = 0 the occurrence of an event is in nitely improbable, and, using the deviance measure, the corresponding model is then in nitely bad. 36 One might expect this phenomenon to be fairly rare, but unfortu...
View Full Document

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern