tasimpute - Statistical Computing Software Reviews Multiple...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Statistical Computing Software Reviews Multiple Imputation in Practice: Comparison of Software Packages for Regression Models With Missing Variables Nicholas J. H ORTON and Stuart R. L IPSITZ Missing data frequently complicates data analysis for scientiFc investigations. The development of statistical methods to ad- dress missing data has been an active area of research in recent decades. Multiple imputation, originally proposed by Rubin in a public use dataset setting, is a general purpose method for an- alyzing datasets with missing data that is broadly applicable to a variety of missing data settings. We review multiple imputation as an analytic strategy for missing data. We describe and evaluate a number of software packages that implement this procedure, and contrast the interface, features, and results. We compare the packages, and detail shortcomings and useful features. The com- parisons are illustrated using examples from an artiFcial dataset and a study of child psychopathology. We suggest additional features as well as discuss limitations and cautions to consider when using multiple imputation as an analytic strategy for in- complete data settings. KEY WORDS: Generalized linear models; Incomplete data; Markov Chain Monte Carlo; Missing outcomes; Missing pre- dictors. 1. INTRODUCTION Missing data is a commonly occurring complication in many scientiFc investigations. Determining the appropriate analytic approach in the presence of incomplete observations is a major question for data analysts. The development of statistical meth- ods to address missing data has been an active area of research in recent decades (Rubin 1976; Little and Rubin 1987; Laird 1988; Ibrahim 1990; Little 1992; Robins, Rotnitzky, and Zhao 1994, 1995; Horton and Laird 1999). There are three types of concerns that typically arise with missing data: (1) loss of efFciency; (2) complication in data handling and analysis; and (3) bias due to Nicholas J. Horton is Assistant Professor, Department of Epidemiology and Biostatistics, Boston University School of Public Health, and Department of Medicine, Boston University School of Medicine, 715 Albany Street, T3E, Boston, MA 02118 (E-mail: horton@bu.edu). Stuart R. Lipsitz is Professor, Department of Biometry and Epidemiology, Medical University of South Car- olina, Charleston, SC 29425. The authors are grateful for the support provided by NIMH grant R01-MH546932. We also thank Gwendolyn Zahner for use of the child psychopathology dataset, which was conducted under contract to the Connecticut Department of Children and Youth Services while Dr. Zahner was on the faculty of the Yale University Child Study Center. differences between the observed and unobserved data (Barnard and Meng 1999). One approach to incomplete data problems that addresses these concerns is multiple imputation, which was proposed 20 years ago by Rubin (1977) and described in detail by Rubin (1987) and Schafer (1997). A concise and readable primer can be found in Schafer (1999), while Rubin (1996) pro- vided an extensive bibliography.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/14/2011 for the course STA 4702 taught by Professor Staff during the Spring '08 term at University of Florida.

Page1 / 11

tasimpute - Statistical Computing Software Reviews Multiple...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online