Rpart_TechReport61

The left improve 8400 3 missing ploidy splits as lrr

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 0 missing Surrogate splits: gleason 5.5 to the left, agree=0.8630, 0 split ploidy splits as LRR, agree=0.6438, 0 split g2 9.945 to the left, agree=0.6301, 0 split age 66.5 to the right, agree=0.5890, 0 split Node number 2: 61 observations predicted class= No expected loss= 0.1475 class counts: 52 9 probabilities: 0.8525 0.1475 Node number 3: 85 observations, complexity param=0.1049 predicted class= Prog expected loss= 0.4706 class counts: 40 45 probabilities: 0.4706 0.5294 left son=6 40 obs right son=7 45 obs Primary splits: g2 13.2 to the left, improve=2.1780, 6 missing ploidy splits as LRR, improve=1.9830, 0 missing age 56.5 to the right, improve=1.6600, 0 missing 19 gleason 8.5 to eet 1.5 to Surrogate splits: ploidy splits as age 68.5 to gleason 6.5 to . . . the left, improve=1.6390, 0 missing the right, improve=0.1086, 1 missing LRL, agree=0.9620, 6 split the right, agree=0.6076, 0 split the left, agree=0.5823, 0 split There are 54 progressions class 1 and 92 non-progressions, so the rst node has an expected loss of 54=146  0:37. The computation is this simple only for the default priors and losses. Grades 1 and 2 go to the left, grades 3 and 4 to the right. The tree is arranged so that the more severe" nodes go to the right. The improvement is n times the change in impurity index. In this instance, the largest improvement is for the variable grade, with an improvement of 10.36. The next best choice is Gleason score, with an improvement of 8.4. The actual values of the improvement are not so important, but their relative size gives an indication of the comparitive utility of the variables. Ploidy is a categorical variable, with values of diploid, tetraploid, and aneuploid, in that order. To check the order, type tablestagec\$ploidy. All three possible splits were attempted: anueploid+diploid vs. tetraploid, anueploid+tetraploid vs. diploid, and anueploid vs. diploid + tetraploid. The best split sends diploid to the right and the others to the left node 6, see gure 3. For node 3, the primary split variable is missing on 6 subjects. All 6 are split based on the rst surrogate, ploidy. Diploid and aneuploid tumors are sent to the left, tetraploid to the right. g2 13.2 g2 13.2 NA Diploid aneuploid 33 2 5 Tetraploid 1 43 1 6 Further options 6.1 Program options The central tting function is rpart, whose main arguments are 20 : the model formula, as in lm and other S model tting functions. The right hand side may contain both continuous and categorical factor terms. If the outcome y has more than two levels, then categorical predictors must be t by exhaustive enumeration, which can take a very long time. data, weights, subset: as for other S models. Weights are not yet supported, and will be ignored if present. method: the type of splitting rule to use. Options at this point are classi cation, anova, Poisson, and exponential. parms: a list of method speci c optional parameters. For classi cation, the list...
View Full Document

This document was uploaded on 09/26/2013.

Ask a homework question - tutors are online