Unformatted text preview: pectively.
More detailed summarization of the splits is again obtained by using the function
summary.rpart.
summaryfit3, cp=.10
Node number 1: 105 observations,
complexity param=0.4601
mean=15810 , SS n=67790000
left son=2 70 obs right son=3 35 obs
Primary splits:
Disp.
156
to the left, improve=0.4601, 0 missing
HP
154
to the left, improve=0.4549, 0 missing
Tank
17.8 to the left, improve=0.4431, 0 missing
Weight
2890 to the left, improve=0.3912, 0 missing
Wheel.base
104.5 to the left, improve=0.3067, 0 missing
Surrogate splits:
Weight
3095 to the left, agree=0.9143, 0 split
HP
139
to the left, agree=0.8952, 0 split
Tank
17.95 to the left, agree=0.8952, 0 split
Wheel.base
105.5 to the left, agree=0.8571, 0 split
Length
185.5 to the left, agree=0.8381, 0 split
Node number 2: 70 observations,
complexity param=0.1123
mean=11860 , SS n=21310000
left son=4 58 obs right son=5 12 obs
Primary splits:
Country splits as LRRLLLLRL, improve=0.5361, 0 missing
Tank
15.65 to the left, improve=0.3805, 0 missing
Weight
2568 to the left, improve=0.3691, 0 missing
Type
splits as RRLRR, improve=0.3650, 0 missing
HP
105.5 to the left, improve=0.3578, 0 missing
Surrogate splits:
Tank
17.8 to the left, agree=0.8571, 0 split
Rear.Seating
28.75 to the left, agree=0.8429, 0 split
.
.
. The improvement listed is the percent change in sums of squares SS for this
split, i.e., 1 , SSright + SSleft =SSparent .
The weight and displacement are very closely related, as shown by the surrogate split agreement of 91.
31 Disp.<156
 Country:aefghj HP.revs<5550 Type:d
7629
n=21 Disp.<267.5
11840
n=37 19290
n=12 17820
n=16 25530
n=8 30940
n=11 Figure 7: A anova tree for the car.test.frame dataset. The label of each node indicates
the mean Price for the cars in that node.
Not all types are represented in node 2, e.g., there are no representatives from
England the second category. This is indicated by a  in the list of split
directions.
plotfit3
textfit3,use.n=T As always, a plot of the t is useful for understanding the rpart object. In this
plot, we use the option use.n=T to add the number of cars in each node. The default
is for only the mean of the response variable to appear. Each individual split is
ordered to send the less expensive cars to the left.
Other plots can be used to help determine the best cp value for this model. The
function rsq.rpart plots the jacknifed error versus the number of splits. Of interest
is the smallest error, but any number of splits within the error bars" 1SE rule
are considered a reasonable number of splits in this case, 1 or 3 splits seem to
be su cient. As is often true with modelling, simplier is usually better. Another
useful plot is the R2 versus number of splits. The 1  apparent error and 1 relative error show how much is gained with additional splits. This plot highlights
the di erences between the R2 values gure 8.
Finally, it is possible to look at the residuals fro...
View
Full
Document
This document was uploaded on 09/26/2013.
 Fall '13

Click to edit the document details