multipleimputation

multipleimputation - P267-25 Multiple Imputation for...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
P267-25 Multiple Imputation for Missing Data: Concepts and New Development Yang C. Yuan, SAS Institute Inc., Rockville, MD Abstract Multiple imputation provides a useful strategy for dealing with data sets with missing values. Instead of filling in a single value for each missing value, Rubin’s (1987) multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed by using standard procedures for com- plete data and combining the results from these analyses. No matter which complete-data analysis is used, the pro- cess of combining results from different imputed data sets is essentially the same. This results in valid statistical in- ferences that properly reflect the uncertainty due to missing values. This paper reviews methods for analyzing missing data, including basic concepts and applications of multiple im- putation techniques. The paper also presents new SAS R procedures for creating multiple imputations for in- complete multivariate data and for analyzing results from multiply imputed data sets. These procedures are still under development and will be available in experimental form in Release 8.1 of the SAS System. Introduction Most SAS statistical procedures exclude observations with any missing variable values from the analysis. These obser- vations are called incomplete cases. While using only com- plete cases has its simplicity, you lose information in the incomplete cases. This approach also ignores the possi- ble systematic difference between the complete cases and incomplete cases, and the resulting inference may not be applicable to the population of all cases, especially with a smaller number of complete cases. Some SAS procedures use all the available cases in an analysis, that is, cases with available information. For ex- ample, PROC CORR estimates a variable mean by using all cases with nonmissing values on this variable, ignor- ing the possible missing values in other variables. PROC CORR also estimates a correlation by using all cases with nonmissing values for this pair of variables. This may make better use of the available data, but the resulting correlation matrix may not be positive definite. Another strategy is simple imputation, in which you substi- tute a value for each missing value. Standard statistical pro- cedures for complete data analysis can then be used with the filled-in data set. For example, each missing value can be imputed from the variable mean of the complete cases, or it can be imputed from the mean conditional on observed values of other variables. This approach treats missing val- ues as if they were known in the complete-data analyses.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/14/2011 for the course STA 4702 taught by Professor Staff during the Spring '08 term at University of Florida.

Page1 / 11

multipleimputation - P267-25 Multiple Imputation for...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online