This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Statistica Sinica 12 (2002), 475-490 SELECTING THE WORKING CORRELATION STRUCTURE IN GENERALIZED ESTIMATING EQUATIONS WITH APPLICATION TO THE LUNG HEALTH STUDY Wei Pan and John E. Connett University of Minnesota Abstract: The generalized estimating equation (GEE) approach is becoming more and more popular in handling correlated response data, for example in longitudi- nal studies. An attractive property of the GEE is that one can use some working correlation structure that may be wrong, but the resulting regression coeﬃcient estimate is still consistent and asymptotically normal. One convenient choice is the independence model: treat the correlated responses as if they were independent. However with time-varying covariates there is a dilemma: using the independence model may be very ineﬃcient (Fitzmaurice (1995)); using a non-diagonal working correlation matrix may violate an important assumption in GEE, producing biased estimates (Pepe and Anderson (1994)). It would be desirable to be able to distin- guish these two situations based on the data at hand. More generally, selecting an appropriate working correlation structure, as an aspect of model selection, may improve estimation eﬃciency. In this paper we propose some resampling-based methods (i.e., the bootstrap and cross-validation) to do this. The methodology is demonstrated by application to the Lung Health Study (LHS) data to investigate the effects of smoking cessation on lung function and on the symptom of chronic cough. In addition, Pepe and Anderson’s result is verified using the LHS data. Key words and phrases: Bootstrap, cross-validation, GEE, GLM, model selection, PMSE. 1. Introduction Correlated responses are common in biomedical studies. One typical ex- ample is the longitudinal study where each subject is followed over a period of time, and repeated observations of the response variable and relevant covariates are recorded. Since repeated observations are made on the same subject, ob- served responses are generally correlated. For continuous responses that can be treated as approximately normal, the linear mixed-effects models can be applied. However for categorical responses, intractability of discrete multivariate distribu- tions hampers, at least partly, the development of corresponding likelihood-based methods. Since the publication of the seminal paper of Liang and Zeger (1986), 476 WEI PAN AND JOHN E. CONNETT the generalized estimating equation (GEE) approach has become increasingly im- portant in handling multivariate continuous/discrete responses. There are many attractive points of the GEE. For instance, it is not likelihood-based: only some lower-order moments, such as the mean and variance, of the response need to be specified. Furthermore, one does not even have to model the correlation structure of the response variable correctly; one only needs to use some working correlation structure to obtain consistent and asymptotically normal estimates. One con- venient choice is the independence model, i.e., the identity matrix serves as thevenient choice is the independence model, i....
View Full Document
This note was uploaded on 06/01/2011 for the course ECON 102 taught by Professor Seng during the Spring '11 term at Wayne State University.
- Spring '11