This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Lecture 23 1. Survival time
2. Censored observations
3. Proc Lifetest: KaplanMeier estimate of the survival distribution
4. Comparing survival distributions
References:
Collett (2003) Modelling Survival Data in Medical Research, 2nd ed.
Allison (1995) Survival Analysis Using the SAS System.
Cantor (2003) SAS Survival Analysis Techniques for Medical Research
Der and Everitt, Chapter 12 1 Timetoevent or survival data In many situations, time until an event occurs is important:
• New treatment for brain cancer: do patients survive longer than after standard
treatment?
• In the AHC, are men awarded tenure earlier and more often than women?
• Time to graduation in MPH programs, compared between SPH divisions.
Each individual has their own time Ti to the event.
Unlike earlier analyses, aim is not point estimate (mean, slope, odds ratio)
but the whole distribution of these times {Ti }.
Much more to ask for, and harder to compare.
2 Outline: two main analyses for survival data 1. Estimate survivor function, compare survivor functions between groups.
Proc Lifetest gives nonparametric productlimit (KaplanMeier) or lifetable estimate, draws graphs, tests for differences.
Nice pictures, but no adjustments—only strata.
Proc LifeReg gives regression adjustment but must specify parametric formula for survivor function; rarely used in health sciences.
2. Estimate ratio of hazard functions between groups, compare ratio to 1.
Proc PHreg does proportional hazards regression to estimate ratio. No pictures (almost) but regression adjustment for ﬁxed and timevarying
predictors.
What are survivor function and hazard?
3 Probability theory deﬁnes distribution by:
• histogram of lifetimes, called density f (t )
• cumulative distribution function = cumulative area under histogram, starting
from left.
F (t ) = t f (u )d u −∞ Survivor function S (t ) = 1 − F (t ). Percent without the event (still alive) at time t . Hazard function h (t ) = f (t )
chance of event at time t
=
S (t )
percent at risk at time t Hazard h (t ) gives the chance of event during a short interval after time t ,
for those who are at risk (alive) at time t .
4 Example: US Census Bureau synthetic cohort for 2002 Histogram (density) of times to death for 2002 US population, truncated at 101.
3.5 Percent of Deaths by age, 2002 3.0
2.5
2.0
1.5
1.0
0.5
0.0
0 20 40 60 80 100 US Population, Age in Years in 2002 E. Arias (2004) United States Life Tables, 2002 (National vital statistics reports; vol 53 no 6.
Hyattsville, Maryland: National Center for Health Statistics.)
5 Hazard function h (t ) = agespeciﬁc death rate Age!Specific Death Rate per 100,000 0.20 0.15 0.10 0.05 0.00
0 20 40 60 US Population, Age in Years in 2002
6 80 Survivor function S (t ) = chance of surviving to age t
100 100
National Vital Statistics Reports, Vol. 53, No. 6, November 10, 2004 80 60 60 40 40 20 20 0 Percent Surviving 80 5 0 Figure 2. Percent surviving by age, race, and sex: United States, 2002 0 20 40 60 80 100 and black males and about
females, the pattern of survival by age is similar. These groups have
US Population, Age in Years in 2002 2.6 percent of white and black females
survive to age 100.
approximately the same median age at death of about 79 years.
Plotting the percent surviving by age for the periods 1900–1902,
However, white males have slightly higher survival rates than black
1949–51, and 2002 shows an increasingly rectangular survival curve
females at the younger ages with 98.6 percent surviving to age 20 and
(figure 3). That is, the survival curve has become increasingly flat in
79.9 percent surviving to age 65 compared with 98.1 percent and
response to progressively lower mortality, particularly at the younger
78.5 percent, respectively, for black females. At the older ages, in
contrast, black female survival surpasses white male survival. At age 7 ages, and increasingly vertical at the older ages. The survival curve for
1900–1902 shows a rapid decline in survival in the first few years of
85, white male survival is 29.2 percent compared with 33.6 percent for
life and a relatively steady decline thereafter. In contrast, the survival
black females. This crossover, which occurs at about age 72, is clearly
curve for 2002 is nearly flat until about age 50 after which the decline
shown in figure 2. The median age at death for black males is 72 years,
in survival becomes more rapid. Improvements in survival between
11 years less than that for white females. 97.4 percent of black males
1900–1902 and 1949–51 occurred at all ages, although the largest
survive to age 20, 65.7 percent to age 65, and 18.2 percent to age 85.
improvements were among the younger population. Between 1949–51
By age 100, there is very little difference between the white and black
populations in terms of survival. Somewhat example
Another survivor function less than 1 percent of white and 2002, improvements occurred primarily for the older population. Figure 3. Percent surviving by age: Deathregistration States, 1900–1902, and United States, 1949–51 and 2002 8 Censored observation times Common problem in survival data is that we don’t observe all event times:
• we stop the study and analyze the data before everyone has had the event
• a person leaves the study and we cannot ﬁnd out whether they had the event
In these cases, all we have is ﬁnal time t 0 subject was known to be alive;
we know only that T > t 0
The ﬁnal time t 0 is called a censored observation, and it’s a lower bound for the
unknown event time T . 9 Clinical study example: eligible participants were enrolled as soon as they
volunteered, and recruitment lasted 2.5 years. The study ended on 1/1/2008.
Subjects died (open circle), dropped out (triangle), or were still alive at study end
(gray dot). !
! !
!
! start
1/1/2005 end
1/1/2006 1/1/2007
Calendar Time 10 1/1/2008 Analysis of clinical study example:
each subject’s time is aligned to start at “study time” = 0. !
!
!
!
! start
0.0 ! end
0.5 1.0 1.5 2.0 2.5 3.0 Time from Enrollment (years) * marks study enrollment, horizontal line indicates time participant was alive,
deaths are indicated by an open circle, censoring by a gray dot. 11 No histogram of survival times with censored data We can draw a histogram of all the times t i
If there are censored times, we know that t i < actual survival time.
No correct place in histogram for censored observations, because they are lower
bounds, not observed times.
However, excluding them gives a biased histogram. Kaplan and Meier (1958) proposed breakthrough method to estimate
survivor function S (t ) from partially censored data. 12 Stomach cancer example Survival times after treatments A or B for 89 patients with stomach cancer
(source: Chapter 12, Der and Everitt). • 45 received treatment A: 38 died, 7 were censored
• 44 received treatment B: 41 died, 3 were censored
Obs
censor
1
0
2
0
3
0
...
87
1
88
0
89
0 days
17
185
542 trt
A
A
A years
0.04654
0.50650
1.48392 1736
380
748 A
B
B 4.75291
1.04038
2.04791 days, years give times t patients were last known alive.
censor = 0 if an event happened at t . censor = 1 if censored (no event yet).
13 Proc Lifetest: KaplanMeier (ProductLimit) estimate of survivor function ODS graphics on;
Proc Lifetest data = stomach_cancer plots=(survival(atrisk=0 to 4 by 1))
TIME
STRATA censoredsymbol="" ; years * censor(1);
trt ; run;
ODS graphics off;
TIME statement is like model statement, speciﬁes response TIME lengthoftime * eventstatus ( censoredvalue ) ;
STRATA variable identifying treatment groups to be compared by test 14 plots=(survival(atrisk=0 to 4 by 1)) censoredsymbol="" ; Sample sizes given at bottom. Need at least 10–15 in each group.
15 plots=( CL survival(atrisk=0 to 4 by 1)) censoredsymbol="" ; 16 The LIFETEST Procedure
Stratum 1: trt = A years
0.00000
0.04654
0.11499
0.12047
0.13142
0.16427
....
3.32375
3.37303*
3.73990
3.98357*
4.33949*
4.44079*
4.45175*
4.75291* Survival Failure Survival
Standard
Error Number
Failed Number
Left 1.0000
0.9778
0.9556
0.9333
0.9111
0.8889 0
0.0222
0.0444
0.0667
0.0889
0.1111 0
0.0220
0.0307
0.0372
0.0424
0.0468 0
1
2
3
4
5 45
44
43
42
41
40 0.1750
.
0.1458
.
.
.
.
. 0.8250
.
0.8542
.
.
.
.
. 0.0572
.
0.0546
.
.
.
.
. 37
37
38
38
38
38
38
38 7
6
5
4
3
2
1
0 NOTE: The marked survival times are censored observations.
17 years : time t when survivor function starts a new value
Survival : KaplanMeier (productlimit) estimate of the survivor function S (t )
for times to the right of time t Failure : KaplanMeier estimate of cumulative mortality, [1 − S (t )] = F (t )
ˆ
Survival Standard Error : the pointwise standard error of the estimate S (t ) Number Failed : the total number of events
Number Left : the number still under observation and at risk for the event 95% conﬁdence interval for the estimated survivor function from the usual
ˆ
ˆ
formula with a standard error (from output): S (t ) ± 1.965 ∗ SE{S (t )}
18 Stratum 1: trt = A
Quartile Estimates
Percent
75
50
25 Point
Estimate
1.58795
0.69541
0.39425
Mean
1.34660 95% Confidence Interval
[Lower
Upper)
1.27036
.
0.52841
1.32512
0.20260
0.53388
Standard Error
0.19441 ˆ
Median survival time is time t when S (t ) = 0.5, the survivor function equals 50%.
ˆ
If S (t ) = 0.5 over an interval, the median is midpoint of the interval.
Mean survival time is area under the KaplanMeier survival curve.
If the largest observed time in the data is censored, then this area is unspeciﬁed.
Don’t report mean survival time if there is any censoring. 19 Summary of censoring in each group.
Summary of the Number of Censored and Uncensored Values
Stratum group Total Failed Censored Percent
Censored 1
A
45
38
7
15.56
2
B
44
41
3
6.82
Total
89
79
10
11.24 Precision of estimates depends on the number of events (“Failed”) not the number
of observations. 20 Tests to compare population survivor functions Lifetest compares population survivor functions S (t ) between groups listed in
the STRATA statement. Null hypothesis: all groups have the same population
survivor function; here, S A (t ) = S B (t ).
• Log rank
• Wilcoxon
• Likelihood ratio test
Ignore likelihood ratio test—it depends on strong assumption (exponential
density) that is usually wrong. 21 Rank Statistics
trt Wilcoxon 3.3043
3.3043 A
B LogRank 502.00
502.00 Test of Equality over Strata
Test
LogRank
Wilcoxon
2Log(LR) ChiSquare DF Pr >
ChiSquare 0.5654
4.3162
0.3574 1
1
1 0.4521
0.0378
0.5500 Two usable tests disagree here. 22 All three tests are based on H0 : S 0(t ) = S 1(t ):
• combine all groups to get a common event rate on each time interval • for each group in each interval, multiply event rate by sample size to get
expected numbers of events
e j k = expected numbers of events in group j during time period k
d j k = observed numbers of events in group j at time k. 23 Logrank test. test statistic is cumulative difference between observed and
expected:
dL =
d 1k − e 1k .
k Rank Statistics
trt
A
B LogRank
3.3043
3.3043 Wilcoxon
502.00
502.00 Test statistic for A was +3.3043, indicating more deaths than expected. Test statistic for B was −3.3043, indicating fewer deaths than expected.
Usually more sensitive test. Best test when the estimated survivor functions do not
cross each other. Often the basis for sample size calculations. 24 Wilcoxon test. Samplesize weighted sum of differences between observed and
expected events:
dW =
k n k (d 1k − e 1k ). Rank Statistics
trt
A
B LogRank
3.3043
3.3043 Wilcoxon
502.00
502.00 Wilcoxon test gives more weight to the early part of the estimated survivor
functions, where there is more information.
Wilcoxon is less sensitive to late differences in survivor functions.
Use Wilcoxon when estimated survivor functions cross each other. 25 Rank Statistics
trt Wilcoxon 3.3043
3.3043 A
B LogRank 502.00
502.00 Test of Equality over Strata
Test
LogRank
Wilcoxon
2Log(LR) ChiSquare DF Pr >
ChiSquare 0.5654
4.3162
0.3574 1
1
1 0.4521
0.0378
0.5500 Which test should we report? 26 Think about sample size as well as whether survivor curves cross. 27 ODS graphics on;
Proc Lifetest data=two_years maxtime=2.0 plots=survival(atrisk=0 to 2 by .5) censoredsymbol="";
time years * censor(1);
strata trt ;
run;
ODS graphics off; 28 29 Proc Lifetest TEST statment Proc Lifetest also compares groups identiﬁed in the TEST statement.
This is intended to test the effect of a continuous explanatory variable.
When used with a categorical variable, such as treatment results are not the same as
from STRATA.
Use STRATA not TEST. 30 ...
View
Full
Document
This note was uploaded on 11/21/2011 for the course PUBH 6470 taught by Professor Williamthomas during the Fall '11 term at University of Florida.
 Fall '11
 WilliamThomas

Click to edit the document details