Unformatted text preview: pectively.
More detailed summarization of the splits is again obtained by using the function
Node number 1: 105 observations,
mean=15810 , SS n=67790000
left son=2 70 obs right son=3 35 obs
to the left, improve=0.4601, 0 missing
to the left, improve=0.4549, 0 missing
17.8 to the left, improve=0.4431, 0 missing
2890 to the left, improve=0.3912, 0 missing
104.5 to the left, improve=0.3067, 0 missing
3095 to the left, agree=0.9143, 0 split
to the left, agree=0.8952, 0 split
17.95 to the left, agree=0.8952, 0 split
105.5 to the left, agree=0.8571, 0 split
185.5 to the left, agree=0.8381, 0 split
Node number 2: 70 observations,
mean=11860 , SS n=21310000
left son=4 58 obs right son=5 12 obs
Country splits as L-RRLLLLRL, improve=0.5361, 0 missing
15.65 to the left, improve=0.3805, 0 missing
2568 to the left, improve=0.3691, 0 missing
splits as R-RLRR, improve=0.3650, 0 missing
105.5 to the left, improve=0.3578, 0 missing
17.8 to the left, agree=0.8571, 0 split
28.75 to the left, agree=0.8429, 0 split
. The improvement listed is the percent change in sums of squares SS for this
split, i.e., 1 , SSright + SSleft =SSparent .
The weight and displacement are very closely related, as shown by the surrogate split agreement of 91.
| Country:aefghj HP.revs<5550 Type:d
n=11 Figure 7: A anova tree for the car.test.frame dataset. The label of each node indicates
the mean Price for the cars in that node.
Not all types are represented in node 2, e.g., there are no representatives from
England the second category. This is indicated by a - in the list of split
textfit3,use.n=T As always, a plot of the t is useful for understanding the rpart object. In this
plot, we use the option use.n=T to add the number of cars in each node. The default
is for only the mean of the response variable to appear. Each individual split is
ordered to send the less expensive cars to the left.
Other plots can be used to help determine the best cp value for this model. The
function rsq.rpart plots the jacknifed error versus the number of splits. Of interest
is the smallest error, but any number of splits within the error bars" 1-SE rule
are considered a reasonable number of splits in this case, 1 or 3 splits seem to
be su cient. As is often true with modelling, simplier is usually better. Another
useful plot is the R2 versus number of splits. The 1 - apparent error and 1 relative error show how much is gained with additional splits. This plot highlights
the di erences between the R2 values gure 8.
Finally, it is possible to look at the residuals fro...
View Full Document
- Fall '13
- Regression Analysis, Missing values