The two splitting criteria arrive at exactly the same

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: for a categorical predictor with m levels, all 2m,1 di erent possible splits are tested.. When there are a large number of categories for the predictor, the computational burden of evaluating all of these subsets can become large. For instance, the call rpartReliabilty  ., data=car.all does not return for a long, long time: one of the predictors in that data set is a factor with 79 levels! Luckily, for any ordered outcome there is a computational shortcut that allows the program to nd the best split using only m , 1 comparisons. This includes the 25 Start>8.5 | absent (64/17) Start>14.5 absent (56/6) Age<55 absent absent (29/0) (27/6) Age>111 absent absent (12/0) (15/6) absent (12/2) Start>12.5 | absent (64/17) present (8/11) Start>8.5 | absent (64/17) Age<34.5 present (20/15) absent (44/2) present (3/4) absent (9/1) absent (56/6) present (8/11) present (11/14) Figure 6: Displays the rpart-based models for the presence absence of kyphosis. The gure on the left uses the default prior 0.79,0.21 and loss; the middle gure uses the user-de ned prior 0.65,0.35 and default loss; and the third gure uses the default prior and the used-de ned loss L1; 2 = 3; L2; 1 = 4. classi cation method when there are only two categories, along with the anova and Poisson methods to be introduced later. 6.3 Example: Kyphosis data A third class method example explores the parameters prior and loss. The dataset kyphosis has 81 rows representing data on 81 children who have had corrective spinal surgery. The variables are: Kyphosis Age Number Start factor: lists if postoperative deformity is present absent numeric: age of child in months numeric: number of vertebrae involved in operation numeric: beginning of the range of vertebrae involved lmat fit1 - matrixc0,4,3,0, nrow=2, ncol=2, byrow=F - rpartKyphosis ~ Age + Number + Start,data=kyphosis fit2 - rpartKyphosis ~ Age + Number + Start,data=kyphosis, 26 fit3 parms=listprior=c.65,.35 - rpartKyphosis ~ Age + Number + Start,data=kyphosis, parms=listloss=lmat parmfrow=c1,3 plotfit1; textfit1,use.n=T,all=T plotfit2; textfit2,use.n=T,all=T plotfit3; textfit3,use.n=T,all=T This example shows how even the initial split changes depending on the prior and loss that are speci ed. The rst and third ts have the same initial split Start 8:5, but the improvement di ers. The second t splits Start at 12.5 which moves 46 people to the left instead of 62. Looking at the leftmost tree, we see that the sequence of splits on the left hand branch yeilds only a single node classi ed as present. For any loss greater than 4 to 3, the routine will instead classify this node as absent, and the entire left side of the tree collapses, as seen in the right hand gure. This is not unusual | the most common e ect of alternate loss matrices is to change the amount of pruning in the tree, more in some branches and less in others, rather than to change the choice of splits. The rst node from th...
View Full Document

This document was uploaded on 09/26/2013.

Ask a homework question - tutors are online