l12 - Missing Data This discussion borrows heavily from:...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Missing Data This discussion borrows heavily from: Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, by Jacob and Patricia Cohen (1975 edition). The 2003 edition of Cohen and Cohen’s book is also used a little. Paul Allison’s Sage Monograph on Missing Data (Sage paper # 136, 2002). Newman, Daniel A. 2003. Longitudinal Modeling with Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques. Organizational Research Methods, Vol. 6 No. 3, July 2003 pp. 328-362. Patrick Royston’s series of articles in volumes 4 and 5 of The Stata Journal on multiple imputation. See especially Royston, Patrick. 2005. Multiple Imputation of Missing Values: Update. The Stata Journal Vol. 5 No. 2, pp. 188-201. Also, Stata 11 has its own built-in commands for multiple imputation. If you have Stata 11, the entire MI manual is available as a PDF file. Often, part or all of the data are missing for a subject. This handout will describe the various types of missing data and common methods for handling it. The readings can help you with the more advanced methods. I. Types of missing data. There are several useful distinctions we can make. Dependent versus independent variables . Most methods involve missing values for IVs, although in recent years methods for dealing with missing data in the dependent variable have been developed. Random versus selective loss of data . A researcher must ask why the data are missing. In some cases the loss is completely at random (MCAR), i.e. the absence of values on an IV is unrelated to Y or other IVs. Also, as Allison notes (p. 4) ―Data on Y are said to be missing at random (MAR) if the probability of missing data on Y is unrelated to the value of Y, after controlling for other variables in the analysis…For example, the MAR assumption would be satisfied if the probability of missing data on income depended on a person’s marital status, but within each marital status category, the probability of missing income was unrelated to income.‖ Unfortunately, in survey research, the loss often is not random. Refusal or inability to respond may be correlated with such things as education, income, interest in the subject, geographic location, etc. Selective loss of data is much more problematic than random loss. Missing by design; or, not asked or not applicable . These are special cases of random versus selective loss of data. Sometimes data are missing because the researcher deliberately did not ask the question of that particular respondent. For economic reasons, some questions might only be asked of a random subsample of the entire sample. For example, prior to 2010 there was a ―short‖ version of the census (answered by everyone) and a ―long‖ version that was only answered by 20%. This can be treated the same as a random loss of data, keeping in mind that the loss may be very high. Other times,
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/29/2012 for the course SOC 63993 taught by Professor Richardwilliams during the Spring '11 term at Notre Dame.

Page1 / 19

l12 - Missing Data This discussion borrows heavily from:...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online