Error and 1 relative error show how much is gained

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: m this model, just as with a regular linear regression t, as shown in the following gure. 32 1.0 • 1.0 • • • • • • • • 3 4 5 • 0.4 0.0 0.2 0.6 • X Relative Error 0.6 • 0.4 R-square • 0.8 0.8 1.2 Apparent X Relative • • 0 1 2 3 4 5 0 Number of Splits 1 2 Number of Splits Figure 8: Both plots were obtained using the function rsq.rpartfit3. The gure on the left shows that the rst split o ers the most information. The gure on the right suggests that the tree should be pruned to include only 1 or 2 splits. plotpredictfit3,residfit3 axis3,at=fit3$frame$yval fit3$frame$var==' leaf ' , labels=row.namesfit3$frame fit3$frame$var==' leaf '  mtext'leaf number',side=3, line=3 ablineh=0 7.3 Example: Stage C prostate cancer anova method The stage C prostate cancer data of the earlier section can also be t using the anova method, by treating the status variable as though it were continuous. cfit2 - rpartpgstat ~ age + eet + g2 + grade + gleason + ploidy, data=stagec printcpcfit2 Regression tree: rpartformula = pgstat ~ age + eet + g2 + grade + gleason + ploidy, data = stagec Variables actually used in tree construction: 1 age g2 grade ploidy 33 leaf number 8 9 12 10000 5000 7 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • -5000 0 13 • • resid(fit3) 5 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • -10000 • • • 10000 15000 20000 25000 30000 predict(fit3) Figure 9: This plot shows the observed-expected cost of cars versus the predicted cost of cars based on the nodes leaves in which the cars landed. There appears to be more variability in node 7 than in some of the other leaves. Root node error: 34.027 146 = 0.23306 CP nsplit rel error 0.152195 0 1.00000 0.054395 1 0.84781 0.032487 3 0.73901 0.019932 4 0.70653 0.013027 8 0.63144 0.010000 9 0.61841 1 2 3 4 5 6 xerror 1.01527 0.86670 0.86524 0.95702 1.05606 1.07727 printcfit2, cp=.03 node, split, n, deviance, yval * denotes terminal node 1 root 146 34.030 0.3699 2 grade 2.5 61 7.672 0.1475 4 g2 13.19 40 1.900 0.0500 * 5 g2 13.19 21 4.667 0.3333 * 3 grade 2.5 85 21.180 0.5294 6 g2 13.2 40 9.775 0.4250 * 34 xstd 0.045470 0.063447 0.075460 0.085390 0.092566 0.094466 7 g2 13.2 45 10.580 0.6222 14 g2 17.91 22 5.091 0.3636 * 15 g2 17.91 23 2.609 0.8696 * If this tree is compared to the earlier results, we see that it has chosen exactly the same variables and split points as before. The only addition is further splitting of node 2, the upper left No" of gure 3. This is no accident, for the two class case the Gini splitting rule reduces to 2p1 , p, which is the variance of a node. The two methods di er in their evaluation and pruning, however. Note that nodes 4 and 5, the two children of node 2, contain 2 40 and 7 21 progressions, respectively. For classi cation purposes both nod...
View Full Document

Ask a homework question - tutors are online