Partitions the 105 observations into groups of 70 and

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: pectively. More detailed summarization of the splits is again obtained by using the function summary.rpart. summaryfit3, cp=.10 Node number 1: 105 observations, complexity param=0.4601 mean=15810 , SS n=67790000 left son=2 70 obs right son=3 35 obs Primary splits: Disp. 156 to the left, improve=0.4601, 0 missing HP 154 to the left, improve=0.4549, 0 missing Tank 17.8 to the left, improve=0.4431, 0 missing Weight 2890 to the left, improve=0.3912, 0 missing Wheel.base 104.5 to the left, improve=0.3067, 0 missing Surrogate splits: Weight 3095 to the left, agree=0.9143, 0 split HP 139 to the left, agree=0.8952, 0 split Tank 17.95 to the left, agree=0.8952, 0 split Wheel.base 105.5 to the left, agree=0.8571, 0 split Length 185.5 to the left, agree=0.8381, 0 split Node number 2: 70 observations, complexity param=0.1123 mean=11860 , SS n=21310000 left son=4 58 obs right son=5 12 obs Primary splits: Country splits as L-RRLLLLRL, improve=0.5361, 0 missing Tank 15.65 to the left, improve=0.3805, 0 missing Weight 2568 to the left, improve=0.3691, 0 missing Type splits as R-RLRR, improve=0.3650, 0 missing HP 105.5 to the left, improve=0.3578, 0 missing Surrogate splits: Tank 17.8 to the left, agree=0.8571, 0 split Rear.Seating 28.75 to the left, agree=0.8429, 0 split . . . The improvement listed is the percent change in sums of squares SS for this split, i.e., 1 , SSright + SSleft =SSparent . The weight and displacement are very closely related, as shown by the surrogate split agreement of 91. 31 Disp.<156 | Country:aefghj HP.revs<5550 Type:d 7629 n=21 Disp.<267.5 11840 n=37 19290 n=12 17820 n=16 25530 n=8 30940 n=11 Figure 7: A anova tree for the car.test.frame dataset. The label of each node indicates the mean Price for the cars in that node. Not all types are represented in node 2, e.g., there are no representatives from England the second category. This is indicated by a - in the list of split directions. plotfit3 textfit3,use.n=T As always, a plot of the t is useful for understanding the rpart object. In this plot, we use the option use.n=T to add the number of cars in each node. The default is for only the mean of the response variable to appear. Each individual split is ordered to send the less expensive cars to the left. Other plots can be used to help determine the best cp value for this model. The function rsq.rpart plots the jacknifed error versus the number of splits. Of interest is the smallest error, but any number of splits within the error bars" 1-SE rule are considered a reasonable number of splits in this case, 1 or 3 splits seem to be su cient. As is often true with modelling, simplier is usually better. Another useful plot is the R2 versus number of splits. The 1 - apparent error and 1 relative error show how much is gained with additional splits. This plot highlights the di erences between the R2 values  gure 8. Finally, it is possible to look at the residuals fro...
View Full Document

Ask a homework question - tutors are online