Unformatted text preview: Survival Analysis
A Brief Introduction 1 2 1. Survival Function, Hazard Function Function
In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring (e.g. when a study ends before the event occurs) Survival Function  A function describing the proportion of individuals surviving to or beyond a given time. Notation:
◦ T: survival time of a randomly selected individual ◦ t: a specific point in time. t ◦Survivalt ) = P(T ≥ t ) = exp − λ (u )du S ( Function: ∫ 0 3 Hazard Function/Rate Hazard Function λ (t): instantaneous failure rate at time t given that the subject has survived upto time t. That is
P (t ≤ T < t + δ  T ≥ t ) P (t ≤ T < t + δ ) = δ P (T ≥ t ) × δ λ ( t ) = limδ →0+
= limδ →0+ f ( t) S (t ) − S (t + δ ) 1 × = δ S (t ) S ( t ) Here f(t) is the probability density function d f ( t) of the survival time T. That is, = F ( t ) dt where F(t) is the cumulative distribution function of T: F ( t ) = 1 − S ( t ) = P ( T ≤ t )
4 2. The Key Word is 2. ‘Censoring’ ‘Censoring’
Because of censoring, many common data analysis procedures can not be adopted directly. For example, one could use the logistic regression model to model the relationship between survival probability and some relevant covariates
◦ However one should use the customized logistic regression procedures designed to 5 account for censoring Key Assumption: Independent Censoring
Those still at risk at time t in the study are a random sample of the population at risk at time t, for all t assumption means that the hazard function, λ(t), can be estimated in a fair/unbiased/valid way This 6 3A. KaplanMeier (ProductLimit) Estimator of the Survival Curve
The Kaplan–Meier estimator is the nonparametric maximum likelihood estimate of S(t). It is a product of the form ˆ (t ) = r1 − d1 × r2 − d 2 × ... × ri − di S r1 r2 ri rk is the number of subjects alive just tk before time tk d k denotes the number who died at time
7 KaplanMeier Curve, Example
Time ti 0 5 6 10 13 # at risk 20 20 18 15 14 # events 0 2 0 1 2 ˆ S
1.00 [1(2/20)]*1.00=0.90 [1(0/18)]*0.90=0.90 [1(1/15)]*0.90=0.84 (1(2/14)]*0.84=0.72
8 Proportion Surviving (95% Confidence) 0.6 0.7 0.8 0.9 1.0
79 .6 .8 00 1 Kaplan Meier Curve 0 5 10 Survival Time 15 20
9 Figure 1. Plot of survival distribution functions for the NCI and the SCI Groups. The Yaxis is the probability of not declining to GDS 3 or above. The Xaxis is the time (in years) to decline. (Barry Reisberg et al., 2010; Alzheimer & Dementia; in press.) 10 3B. Comparing Survival 3B. Functions Functions
1.00 0.75 High Survival Distribution Function 0.50 0.25 Low Medium
0 10 20 30 40 50 60 0.00 Time
11 LogRank Test
The logrank test • tests whether the survival functions are statistically equivalent • is a largesample chisquare test that uses the observed and expected cell counts across the event times • has maximum power when the ratio of hazards is constant over time. 12 Wilcoxon Test
The Wilcoxon test • weights the observed number of events minus the expected number of events by the number at risk across the event times • can be biased if the pattern of censoring is different between the groups. 13 Logrank versus Wilcoxon Logrank Test Test
Logrank test • is more sensitive than the Wilcoxon test to differences between groups in later points in time. Wilcoxon test • is more sensitive than the logrank test to differences between groups that occur in early points in time. 14 4. Two Parametric Distributions
Here we present two most notable models for the distribution of T. Exponential distribution: λ (t ) = λ Weibull distribution: p −1 p p −1 λ (t ) = λp(λt ) = pλ × t
◦ Its survival function:
t p p −1 S (t ) = exp − ∫ pλ u du = exp − (λt ) p 0 ( )
15 ◦ Thus: ln ( − ln(S (t ) ) = p( ln(t ) + ln(λ ) ) Weibull Hazard Function, Plot 16 5. Regression Models The Exponential and the Weibull distribution inspired two parametric regression approaches: 1. Parametric proportional hazard model – this model can be generalized to a semiparametric model: the Cox proportional hazard model 2. Accelerated failure time model
17 Proportional Hazard Model
In a regression model for survival analysis one can try to model the dependence on the explanatory variables by taking the (new) hazard rate to be: λ = λ0 × c( β 0 + β1 xi1 + β 2 xi 2 + ... + β k xik ) rates being positive it is natural to choose the function c such that c(β,x) is positive irrespective the values of x.
18 Hazard Proportional Hazard Model
a good choice is: c(.) = exp(.) The resulting proportional hazard model is: λ = λ0 × exp(β 0 + β1xi1 + β 2 xi 2 + ... + β k xik )
Thus For λ = pλ0 × t the Weibull distribution we have: p p −1 × exp( β 0 + β1xi1 + β 2 xi 2 + ... + β k xik ) For the Exponential distribution we λ = λ0 × exp(β 0 + β1xi1 + β 2 xi 2 + ... + β k xik ) have:
19 Accelerated Failure Time Model
For the Weibull distribution (including the Exponential distribution), the proportional hazard model is equivalent to a log linear model in survival time T:
ln ( T ) = α 0 + α1 xi1 + α 2 xi 2 + ... + α k xik + σε ε
Here the error term can be shown to follow the 2parameter Extreme Vvalue distribution 20 Apply Both Models Simultaneously
If the underlying distribution for T is Weibull or Exponential, one can apply both regression models simultaneously to reflect different aspects of the survival process. That is Prediction of degree of decline using the Weibull proportional hazard model Prediction of time of decline using the accelerated failure time model
21 An Example Λ (T ) = Λ 0 (T ) exp(α 1 * Group + α 2 * Age + α 3 * Gender + α 4 * Education + α 5 * FollowUp)
log T = β 0 + β1 * Group + β 2 * Age + β3 * Gender + β 4 * Education + β5 * FollowUp + σε In a recent paper (Reisberg et al., 2010), we applied both regression models to a dementia study conducted at NYU: The results are shown next
22 23 6. Cox Proportional Hazards Model 24 Parametric versus Parametric Nonparametric Models Nonparametric
Parametric models require that • the distribution of survival time is known • the hazard function is completely specified except for the values of the unknown parameters. Examples include the Weibull model, the exponential model, and the lognormal model. 25 Parametric versus Parametric Nonparametric Models Nonparametric
Properties of nonparametric models are • the distribution of survival time is unknown • the hazard function is unspecified. An example is the Cox proportional hazards model. 26 ... Cox Proportional Hazards Model hi (t ) = h0 (t )e
Baseline Hazard function involves time but not predictor variables { β1 X i 1 +...+ β k X ik } Linear function of a set of predictor variables does not involve time β = 0 → hazard ratio = 1 Two groups have the same survival experience
27 Popularity of the Cox Model
The Cox proportional hazards model • provides the primary information desired from a survival analysis, hazard ratios and adjusted survival curves, with a minimum number of assumptions • is a robust model where the regression coefficients closely approximate the results from the correct parametric model. 28 Partial Likelihood
Partial likelihood differs from maximum likelihood because • it does not use the likelihoods for all subjects • it only considers likelihoods for subjects that experience the event • it considers subjects as part of the risk set until they are censored. 29 Partial Likelihood
Subject C B A D E Survival Time 2.0 3.0 4.0 5.0 6.0 Status 1 1 0 1 0 30 Partial Likelihood
hc (2) Lc = hc (2) + hb (2) + ha (2) + hd (2) + he (2) hb (3) Lb = hb (3) + ha (3) + hd (3) + he (3) hd (5) Ld = hd (5) + he (5) 31 Partial Likelihood
hd (5) Ld = hd (5) + he (5) Ld = ho (5)e
β1 X d 1 +β 2 X d 2 + .... + β k X dk ho (5)e β1 X d 1 +β 2 X d 2 + .... + β k X dk + ho (5)e β1 X e1 +β 2 X e 2 + .... + β k X ek Ld = e β1 X d 1 +β 2 X d 2 + .... + β k X dk e β1 X d 1 +β 2 X d 2 + .... + β k X dk + e β1 X e1 +β 2 X e 2 + .... + β k X ek
32 Partial Likelihood
The overall likelihood is the product of the individual likelihood. That is: L = Lc * Lb * Ld 33 7. SAS Programs for Survival Analysis There are three SAS procedures for analyzing survival data: LIFETEST, PHREG, and LIFEREG. PROC LIFETEST is a nonparametric procedure for estimating the survivor function, comparing the underlying survival curves of two or more samples, and testing the association of survival time with other variables. PROC PHREG is a semiparametric procedure that fits the Cox proportional hazards model and its extensions. PROC LIFEREG is a parametric regression procedure for modeling the distribution of survival time with a set of concomitant variables.
34 Proc LIFETEST
The KaplanMeier(KM) survival survival curves and related tests (LogRank, Wilcoxon) can be generated using SAS PROC LIFETEST SAS LIFETEST PROC LIFETEST DATA=SASdataset <options>; TIME variable <*censor(list)>; STRATA variable <(list)> <...variable <(list)>>; TEST variables; 35 Proc PHREG
The Cox (proportional hazards) regression is performed using SAS PROC PHREG proc phreg data=rsmodel.colon; model surv_mm*status(0,2,4) = sex yydx / risklimits; run;
36 Proc LIFEREG
The accelerated failure time regression is performed using SAS PROC LIFEREG proc lifereg data=subset outest=OUTEST(keep=_scale_); model (lower, hours) = yrs_ed yrs_exp / d=normal; output out=OUT xbeta=Xbeta; run;
37 Selected References PD Allison (1995). Survival Analysis Using SAS: A Practical Guide. SAS Publishing. JD Kalbfleisch and RL Prentice (2002).The Statistical Analysis of Failure Time Data. WileyInterscience. 38 Questions? 39 ...
View
Full
Document
This note was uploaded on 01/31/2011 for the course AMS 572 taught by Professor Weizhu during the Fall '10 term at SUNY Stony Brook.
 Fall '10
 WeiZhu

Click to edit the document details