Biometrika
(1993),
80,
3,
pp.
557-72
Printed
in Great Britain
Checking
the Cox
model with cumulative sums
of
martingale-based residuals
BY
D. Y. LIN
Department of Biostatistics, SC-32,
University
of
Washington,
Seattle,
Washington
98195,
U.S.A.
L. J. WEI
Department of Biostatistics, Harvard
University,
677 Huntington Ave., Boston,
Massachusetts
02115,
U.S.A.
AND
Z. YING
Department of
Statistics,
101 Illini Hall,
University
of Illinois, Champaign,
Illinois
61820,
U.S.A.
SUMMARY
This paper presents
a
new class of graphical and numerical methods for checking the
adequacy
of
the Cox regression model. The procedures are derived from cumulative
sums of martingale-based residuals over follow-up time and/or covariate values. The dis-
tributions
of
these stochastic processes under the assumed model can be approximated
by zero-mean Gaussian processes. Each observed process can then
be
compared, both
visually and analytically, with
a
number of simulated realizations from the approximate
null distribution. These comparisons enable the data analyst
to
assess objectively how
unusual the observed residual patterns are. Special attention
is
given
to
checking
the
functional form of a covariate, the form of the link function, and the validity of the pro-
portional hazards assumption. An omnibus test, consistent against any model misspecifi-
cation, is also studied. The proposed techniques are illustrated with two real data sets.
Some
key
words:
Censoring; Goodness
of
fit;
Link function; Omnibus test; Proportional hazards; Regression
diagnostic; Residual plot; Survival data.
1.
INTRODUCTION
The proportional hazards model (Cox, 1972) with the partial likelihood principle
(Cox, 1975) has become exceedingly popular
for
the analysis
of
failure time obser-
vations. This model specifies that the hazard function for the failure time
T
associated
with a
p
x
1 vector of covariates
Z
takes the form of
\(t;Z)
=
\(t)exp(P'
0
Z),
(1-1)
where A<,(.)
is an
unspecified baseline hazard function, and /3
0
is a
p x
1 vector
of
unknown regression parameters.
Let
C
denote the censoring time. Assume that
Z
is bounded and that
T
and
C
are
independent conditional
on Z.
Suppose that the data consist
of
n
independent repli-
cates
of
(X,A,Z),
where
X
=
min(T,
C),
A
= I(T^C),
and /(.) is the
indicator
at North Carolina State University on October 15, 2012
http://biomet.oxfordjournals.org/
Downloaded from