Identi°cation on Regressions with Missing Covariate Data
°
Esteban M. Aucejo
y
Federico A. Bugni
z
V. Joseph Hotz
x
February 8, 2010
Abstract
This paper examines the problem of identi°cation and inference on parametric models when
there are missing data, with special focus on the case when covariates, denoted by
X
, are
missing.
Our econometric model is given by a conditional moment condition implied by the
assumption that
X
is strictly exogenous. At the same time, we assume that the distribution of
the missing data is unknown. We confront the missing data problem by adopting a worst case
scenario approach.
We characterize the sharp identi°ed set and argue that this set is usually prohibitively
complex to compute or to use for inference. Given this di¢ culty, we consider the construction
of outer identi°ed sets (that is, supersets of the identi°ed set) that are easier to compute and can
still provide a characterization of the parameter of interest. Two di/erent outer identi°cation
strategies are discussed.
Both of these strategies are shown to contain nontrivial identifying
power and are relatively easy to compute and to be used for inference.
Keywords:
Missing Data, Missing Covariate Data, Partial Identi°cation, Outer Identi°ed Sets.
JEL Classi°cation Codes:
C01, C10, C20, C25.
°
Thanks to Arie Beresteanu for useful comments and discussions. Any and all errors are our own.
y
Department of Economics, Duke University. Email:
[email protected]
z
Department of Economics, Duke University. Email:
[email protected]
x
Department of Economics, Duke University, NBER and IZA. Email:
[email protected]
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
1
Introduction
The problem of missing data is a ubiquitous problem in empirical social science research. When
survey data is used to estimate an econometric model, the researcher is often faced with the situ
ation in which a dataset has missing observations on either outcome variables and/or covariates.
Furthermore, one typically does not know the distribution of these missing data. This paper ex
amines the problem of identi°cation and inference on parametric models when there are missing
outcome or covariate data.
We focus on the case when covariates present missing observations,
although we also consider the case in which outcome and covariates are simultaneously missing
1
.
Our econometric model is as follows.
We are interested in the true parameter value
°
0
that
belongs to a parameter space
°
±
R
L
that satis°es the following conditional moment condition,
E
(
Y
²
f
(
X; °
0
)
j
X
=
x
) = 0
;
for every
x
2
S
X
(1)
where
Y
: ±
!
S
Y
±
R
denotes the outcome,
X
: ±
!
S
X
±
R
K
denotes the vector of covariates,
f
:
R
K
³
R
L
!
R
denotes a known function and
S
X
and
S
Y
denote the support of the outcome
and the covariate, respectively. This econometric model can be equivalently expressed as follows,
Y
=
f
(
X; °
0
) +
"
(2)
where
"
: ±
!
R
is a mean independent error term with its mean normalized to zero,
E
(
"
j
X
=
x
) = 0
;
for every
x
2
S
X
(3)
Condition (3) implies that the covariates,
X
, are
strictly exogenous
visavis
"
, which is the
standard assumption of the classical regression model. It encompasses a wide range of nonlinear
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '08
 Staff
 Probability theory, probability density function, C. F. Manski, outer identi…ed, outer identi…ed sets

Click to edit the document details