LASSO METHODS FOR GAUSSIAN INSTRUMENTAL VARIABLES
MODELS
A. BELLONI, V. CHERNOZHUKOV, AND C. HANSEN
Abstract.
In this note, we propose to use sparse methods (e.g.
LASSO, PostLASSO,
√
LASSO, and Post
√
LASSO) to form firststage predictions and estimate optimal instru
ments in linear instrumental variables (IV) models with many instruments in the canonical
Gaussian case. The methods apply even when the number of instruments is much larger than
the sample size. We derive asymptotic distributions for the resulting IV estimators and provide
conditions under which these sparsitybased IV estimators are asymptotically oracleefficient.
In simulation experiments, a sparsitybased IV estimator with a datadriven penalty performs
well compared to recently advocated manyinstrumentrobust procedures. We illustrate the
procedure in an empirical example using the Angrist and Krueger (1991) schooling data.
1.
Introduction
Instrumental variables (IV) methods are widely used in applied statistics, econometrics,
and more generally for estimating treatment effects in situations where the treatment status
is not randomly assigned; see, for example, [1, 4, 5, 7, 16, 21, 26, 27, 29, 30] among many
others. Identification of the causal effects of interest in this setting may be achieved through
the use of observed instrumental variables that are relevant in determining the treatment
status but are otherwise unrelated to the outcome of interest. In some situations, many such
instrumental variables are available, and the researcher is left with the question of which set
of the instruments to use in constructing the IV estimator. We consider one such approach to
answering this question based on sparseestimation methods in a simple Gaussian setting.
Date
: First version: June 2009, This
version of December 7, 2010.
1
arXiv:1012.1297v1
[stat.ME]
6 Dec 2010
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
2
BELLONI
CHERNOZHUKOV
HANSEN
Throughout the paper we consider the Gaussian simultaneous equation model:
1
y
1
i
=
y
2
i
α
1
+
w
0
i
α
2
+
i
,
(1.1)
y
2
i
=
D
(
x
i
) +
v
i
,
(1.2)
i
v
i
!
∼
N
0
,
σ
2
σ
v
σ
v
σ
2
v
!!
(1.3)
where
y
1
i
is the response variable,
y
2
i
is the endogenous variable,
w
i
is a
k
w
vector of control
variables, and
x
i
= (
z
0
i
, w
0
i
)
0
is a vector of instrumental variables (IV), and (
i
, v
i
) are distur
bances that are independent of
x
i
. The function
D
(
x
i
) = E[
y
2
i

x
i
] is an unknown, potentially
complicated function of the instruments. Given a sample (
y
1
i
, y
2
i
, x
i
)
, i
= 1
, . . . , n
, from the
model above, the problem is to construct an IV estimator for
α
0
= (
α
1
, α
0
2
)
0
that enjoys good
finite sample properties and is asymptotically efficient.
We consider the case of fixed design, namely we treat the covariate values
x
1
, . . . , x
n
as fixed.
This includes random sampling as a special case; indeed, in this case
x
1
, . . . , x
n
represent a
realization of this sample on which we condition throughout. Note that for convenience, the
notation has been collected in Appendix A.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '08
 Staff
 Econometrics, Regression Analysis, Estimation theory, Lasso

Click to edit the document details