This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: LASSO METHODS FOR GAUSSIAN INSTRUMENTAL VARIABLES MODELS A. BELLONI, V. CHERNOZHUKOV, AND C. HANSEN Abstract. In this note, we propose to use sparse methods (e.g. LASSO, Post-LASSO, LASSO, and Post- LASSO) to form first-stage predictions and estimate optimal instru- ments in linear instrumental variables (IV) models with many instruments in the canonical Gaussian case. The methods apply even when the number of instruments is much larger than the sample size. We derive asymptotic distributions for the resulting IV estimators and provide conditions under which these sparsity-based IV estimators are asymptotically oracle-efficient. In simulation experiments, a sparsity-based IV estimator with a data-driven penalty performs well compared to recently advocated many-instrument-robust procedures. We illustrate the procedure in an empirical example using the Angrist and Krueger (1991) schooling data. 1. Introduction Instrumental variables (IV) methods are widely used in applied statistics, econometrics, and more generally for estimating treatment effects in situations where the treatment status is not randomly assigned; see, for example, [1, 4, 5, 7, 16, 21, 26, 27, 29, 30] among many others. Identification of the causal effects of interest in this setting may be achieved through the use of observed instrumental variables that are relevant in determining the treatment status but are otherwise unrelated to the outcome of interest. In some situations, many such instrumental variables are available, and the researcher is left with the question of which set of the instruments to use in constructing the IV estimator. We consider one such approach to answering this question based on sparse-estimation methods in a simple Gaussian setting. Date : First version: June 2009, This version of December 7, 2010. 1 arXiv:1012.1297v1 [stat.ME] 6 Dec 2010 2 BELLONI CHERNOZHUKOV HANSEN Throughout the paper we consider the Gaussian simultaneous equation model: 1 y 1 i = y 2 i 1 + w i 2 + i , (1.1) y 2 i = D ( x i ) + v i , (1.2) i v i ! N , 2 v v 2 v !! (1.3) where y 1 i is the response variable, y 2 i is the endogenous variable, w i is a k w-vector of control variables, and x i = ( z i ,w i ) is a vector of instrumental variables (IV), and ( i ,v i ) are distur- bances that are independent of x i . The function D ( x i ) = E[ y 2 i | x i ] is an unknown, potentially complicated function of the instruments. Given a sample ( y 1 i ,y 2 i ,x i ) ,i = 1 ,...,n , from the model above, the problem is to construct an IV estimator for = ( 1 , 2 ) that enjoys good finite sample properties and is asymptotically efficient. We consider the case of fixed design, namely we treat the covariate values x 1 ,...,x n as fixed....
View Full Document
This note was uploaded on 12/26/2011 for the course ECON 245a taught by Professor Staff during the Fall '08 term at UCSB.
- Fall '08