Rpart_TechReport61

# 10 the rst gure shows the solder data t with the

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ceparent , Devianceleft + Devianceright , which is the likelihood ratio test for comparing two Poisson samples. The cross-validated error has been found to be overly pessimistic when describing how much the error is improved by each split. This is likely an e ect of the boundary e ect mentioned earlier, but more research is needed. The variation xstd is not as useful, given the bias of xerror. plotfit textfit,use.n=T fit.prune - prunefit,cp=.15 plotfit.prune textfit.prune,use.n=T 40 The use.n=T option speci es that number of events total N should be listed along with the predicted rate number of events person-years. The function prune trims the tree fit to the cp value 0:15. The same tree could have been created by specifying cp = .15 in the original call to rpart. 8.4 Example: Stage C prostate cancer survival method One special case of the Poisson model is of particular interest for medical consulting such as the authors do. Assume that we have survival data, i.e., each subject has either 0 or 1 event. Further, assume that the time values have been pre-scaled so as to t an exponential model. That is, stretch the time axis so that a Kaplan-Meier plot of the data will be a straight line when plotted on the logarithmic scale. An approximate way to do this is temp - coxphSurvtime, status ~1 newtime - predicttemp, type='expected' and then do the analysis using the newtime variable. This replaces each time value by t, where  is the cumulative hazard function. A slightly more sophisticated version of this which we will call exponential scaling gives a straight line curve for logsurvival under a parametric exponential model. The only di erence from the approximate scaling above is that a subject who is censored between observed death times will receive credit" for the intervening interval, i.e., we assume the baseline hazard to be linear between observed deaths. If the data is pre-scaled in this way, then the Poisson model above is equivalent to the local full likelihood tree model of LeBlanc and Crowley 3 . They show that this model is more e cient than the earlier suggestion of Therneau et. al. 6 to use the martingale residuals from a Cox model as input to a regression tree anova method. Exponential scaling or method='exp' is the default if y is a Surv object. Let us again return to the stage C cancer example. Besides the variables explained previously we will use pgtime, which is time to tumor progression. fit - rpartSurvpgtime, pgstat ~ age + eet + g2 + grade + gleason + ploidy, data=stagec printfit node, split, n, deviance, yval * denotes terminal node 1 root 146 195.30 1.0000 2 grade 2.5 61 44.98 0.3617 4 g2 11.36 33 9.13 0.1220 * 5 g2 11.36 28 27.70 0.7341 * 41 3 grade 2.5 85 125.10 1.6230 6 age 56.5 75 104.00 1.4320 12 gleason 7.5 50 66.49 1.1490 24 g2 13.475 25 29.10 0.8817 * 25 g2 13.475 25 36.05 1.4080 50 g2 17.915 14 18.72 0.8795 * 51 g2 17.915 11 13.70 2.1830 * 13 gleason 7.5 25 34.13 2....
View Full Document

## This document was uploaded on 09/26/2013.

Ask a homework question - tutors are online