Course: ECON 120C, Spring 2011
School: UCSD
Word Count: 1944

120C IV Econ Estimation IV estimation can be extended to the multiple regression case Call the model we are interested in estimating the structural model Our problem is that one or more of the variables are endogenous We need an instrument for each endogenous variable Instrumental Variables &amp; 2SLS Multiple Regression 1 Generalizing the TSLS method Econ 120C Econ 120C 2 Generalizing the TSLS...

120C IV Econ Estimation IV estimation can be extended to the multiple regression case Call the model we are interested in estimating the structural model Our problem is that one or more of the variables are endogenous We need an instrument for each endogenous variable Instrumental Variables & 2SLS Multiple Regression 1 Generalizing the TSLS method Econ 120C Econ 120C 2 Generalizing the TSLS method Econ 120C The two stages of TSLS are: Stage 1. Predict X using 1. Adding exogenous regressors to the model. Suppose now the regression of interest is Stage 2: Use the predicted values where W is exogenous. Why use W in first stage? We want the best possible prediction/decomposition. We need to make arguments as to why W is not correlated with u as we have earlier 3 4 1 Generalizing the TSLS method Econ 120C 2. More than one instrument. Multiple Regression IV/2SLS Econ 120C 3. In general yi = 0 + 1x1i ++ kxki +k+1w1i + k+rwri + ui, where (x1 ,x2 ,, xk ) are k endogenous regressors and (w1 ,w2,,wr ) is are r included exogenous regressors Let (z1 ,z2 ,, z m ) be m instruments, so cov(z j ,u) = 0. The model is exactly (just) identified when m=k It is over-identified when m>k It is under-identified if m<k For example, suppose k = 1 but m = 0 (no instruments)! 5 Multiple Regression IV/2SLS Z1 Z2 Zm W1 W2 Wr X1 X2 Xk Econ 120C 6 Multiple Regression IV/2SLS Econ 120C 1st Stage Regression Regress each of the X on ALL exogenous variables (including both Z and W) to get predicted X's. u 2nd Stage Regression Regress Y on predicted X's and W's Y In STATA, ivreg y (x1 x2 =z1 z2 z3) w1 w2 w3 w4,r or ivreg y w1 w2 w3 w4 (x1 x2 =z1 z2 z3),r 7 8 2 Econ 120C Z1 Z2 Zm X1c X2c Xkc W1 W2 Wr Example: Demand for cigarettes Econ 120C ln(Qicigarettes ) = 0 + 1ln( Pi cigarettes ) + 2ln(Incomei) + ui u Z1i = general sales taxi Z2i = cigarette-specific taxi Endogenous variable: ln( Pi cigarettes ) (one X) Included exogenous variable: ln(Incomei) (one W) Instruments (excluded endogenous variables): general sales tax, cigarette-specific tax (two Zs) Is the demand elasticity 1 overidentified, exactly identified, or underidentified? Y The Second Stage Regression 9 Example: Cigarette demand, one instrument 10 Econ 120C Example: Cigarette demand, Two instruments Econ 120C Y W X Z . ivreg lpackpc lperinc (lravgprs = rtaxso) if year==1995, r; IV (2SLS) regression with robust standard errors Number of obs = F( 2, 45) = Prob > F = R-squared = Root MSE = Y W X Z1 Z2 . ivreg lpackpc lperinc (lravgprs = rtaxso rtax) if year==1995, r; 48 8.19 0.0009 0.4189 .18957 -----------------------------------------------------------------------------| Robust lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lravgprs | -1.143375 .3723025 -3.07 0.004 -1.893231 -.3935191 lperinc | .214515 .3117467 0.69 0.495 -.413375 .842405 _cons | 9.430658 1.259392 7.49 0.000 6.894112 11.9672 -----------------------------------------------------------------------------Instrumented: lravgprs Instruments: lperinc rtaxso STATA lists ALL the exogenous regressors as instruments slightly different terminology than we have been using ------------------------------------------------------------------------------ Running IV as a single command yields correct SEs Use , r for heteroskedasticity-robust SEs 11 IV (2SLS) regression with robust standard errors Number of obs = F( 2, 45) = Prob > F = R-squared = Root MSE = 48 16.17 0.0000 0.4294 .18786 -----------------------------------------------------------------------------| Robust lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lravgprs | -1.277424 .2496099 -5.12 0.000 -1.780164 -.7746837 lperinc | .2804045 .2538894 1.10 0.275 -.230955 .7917641 _cons | 9.894955 .9592169 10.32 0.000 7.962993 11.82692 -----------------------------------------------------------------------------Instrumented: lravgprs Instruments: lperinc rtaxso rtax STATA lists ALL the exogenous regressors as instruments slightly different terminology from what we have been using ------------------------------------------------------------------------------ 12 3 Example: Cigarette demand, one instrument Econ 120C TSLS estimates, Z = sales tax (m = 1) ln(Q cigarettes i ) = 9.43 1.14 ln( Pi (1.26) (0.37) cigarettes Z1 ) + 0.21ln(Incomei) u Y One instrument ln(Qicigarettes ) = 9.89 1.28 ln( Pi cigarettes ) + 0.28ln(Incomei) (0.25) Z1 Z2 Xc Xd Econ 120C (0.31) TSLS estimates, Z = sales tax, cig-only tax (m = 2) (0.96) (0.25) Clean component X1 is larger Smaller SEs for m = 2. Using 2 instruments gives more information more as-if random variation. Xc u Xd Two instruments Y 13 14 Instrument relevance Econ 120C Econ 120C Yi = 0 + 1X1i + + kXki + k+1W1i + + k+rWri + ui Instrument relevance: General case, multiple Xs. If the Xs are replaced by their clean components and there is no perfect multicollinearity in the above regression, then we say the instruments are relevant Instrument Relevance 15 Multicollinearity occurs when variables are so highly correlated with each other that it is difficult to come up with reliable estimates of their individual regression coefficients. When two variables are highly correlated, they are basically measuring the same phenomenon or construct. In other words, when two variables are highly correlated, they both convey essentially the same information. 16 4 Instrument relevance Econ 120C Z1 Z2 Zm X1c X2c Xkc W1 W2 Wr Instrument relevance: Multi-collinearity Econ 120C X1c X1c u Y X2c W X2c W No multicollinearity between X1c, ,Xkc W1,,Wr 17 Instrument exogeneity: Econ 120C 18 Checking Instrument Relevance Econ 120C Instrument exogeneity: corr(Z1i,ui) = 0,, corr(Zmi,ui) = 0 We will focus on a single included endogenous regressor: Yi = 0 + 1Xi + 2W1i + + 1+rWri + ui What happens if one of these requirements isnt satisfied? How can you check? What do you do? If you have multiple instruments, which should you use? First stage regression: Xi = 0 + 1Z1i ++ mZmi + m+1W1i ++ m+rWri + vi The instruments are relevant if at least one of 1,,m are nonzero. The instruments are said to be weak if all the 1,,m are either zero or nearly zero. Weak instruments explain very little the of variation in X, beyond that explained by the Ws 19 20 5 Checking Instrument Relevance Z1 Z2 Zm X W1 W2 Wr Econ 120C Checking Instrument Relevance If instruments are weak, the sampling distribution of TSLS and its t-statistic are not (at all) normal, even with n large. Consider the simplest case: Yi = 0 + 1Xi + ui Xi = 0 + 1Zi + vi s The IV estimator is 1TSLS = YZ s XZ If cov(X,Z) is zero or small, then sXZ will be small: With weak instruments, the denominator is nearly zero. If so, the sampling distribution of 1TSLS (and its t-statistic) is not well approximated by its large-n normal approximation u Y Holding Ws constant, at least one of the Zs can change X 21 Checking Instrument Relevance Econ 120C Econ 120C An example: the sampling distribution of the TSLS t-statistic with weak instruments 22 Model Econ 120C Yi Xi ui , i 1, 2, . . . , n 0, 1 Dark line = irrelevant instruments Dashed light line = strong instruments 0, Xi iid N , 1 n 100 0 ui 0. 1Xi ei , ei iid N , 1 Zi iid , 1 0 23 24 6 IV estimate in Repeated Experiments Econ 120C -600 .8 -400 1 beta_ols beta_iv -200 0 1.2 200 1.4 400 OLS estimate in repeated Experiments Econ 120C 0 200 400 xx 600 800 0 1000 200 400 xx 600 800 1000 25 Checking Instrument Relevance Econ 120C Why does our trusty normal approximation fail us? s = YZ s XZ If cov(X,Z) is small, small changes in sXZ (from one sample to the next) can induce big changes in 1TSLS Suppose in one sample you calculate sXZ = .00001... Thus the large-n normal approximation is a poor approximation to the sampling distribution of 1TSLS If instruments are weak, the usual methods of inference are unreliable potentially very unreliable. TSLS 1 27 26 Checking Instrument Relevance Econ 120C But we have data to examine this, namely the first stage regression. In general this regression is We might think to look at the t-stats, however all we are concerned about is the overall predictability of the Zs, hence we test this with an F statistic 28 7 Checking Instrument Relevance Econ 120C We wish to test the hypotheses Example: Weak Instruments Econ 120C Angrist and Krueger (education pays but by how much? ) log(wage) =a + b* (years of schooling) + error Notice that the test is only on the coefficients of the Zs, not the Ws. How large an F do we need? Rule of thumb: F>10 means that instruments are not weak. Years of schooling Innate Ability ? Wage Innate Ability is a confounding factor 29 Example: Weak Instruments Econ 120C 30 Example: Weak Instruments (cont.) Econ 120C A student who is born in November will generally start 1st grade nearly a year younger than a student who is born in December. Angrist and Krueger attempted to instrument for years of schooling using quarter of birth. Their reasoning was that compulsory schooling laws are based on age, yet the age at which students begin school depends on quarter of birth. Students are typically allowed to leave as soon as they reach their 16th birthday November-born students tend to be in 10th grade when they reach the end of compulsory schooling and can choose to drop out. For example, in California, a student starts school if he/she turns 6 by Dec 2. December-born students tend to be in 9th grade when they can drop out. 31 32 8 Example: Weak Instruments (cont.) Econ 120C A Quote Econ 120C Give me a LEVER long enough and a place to stand and I will move the world Thus, quarter of birth is correlated with years of schooling for students who drop out at the end of the compulsory schooling period. ------------ ARCHIMEDES, 280 B.C. However, this connection is tenuous. Give me an instrument that is strong enough and does not interfere with the system directly and I can make causal inference using observational data It turns out the quarter of birth is a weak instrument and the famous study is not reliable. Reasons: Not many people dropped out at the end of the compulsory schooling period and for those who dropped, the quarter of birth changes the years of schooling by at most 1 year. 33 Econ 120C 34 Checking Instrument Exogeneity Econ 120C What happens if Z is correlated with u ? Zi Yi Zi X i u i IV Zi X i Z iX i Instrument Exogeneity Zi X i Z iu i Zi X i 35 1 n 1 n Zi u i Zi X i covZ, u covZ, X 36 9 Checking Instrument Exogeneity Econ 120C cov Z,u cov Z,X If cov(Z,u)0, then IV estimator is inconsistent 37 What if the model is over-identified? Econ 120C Z1 Z2 Xc Z2 38 Checking Instrument Exogeneity Econ 120C yi = 0 + 1xi+ ui u Suppose we have two instruments: z1i and z2i. Then we have two possible the FS regression: Xd (a)xi = 0a + 1az1i+ error (b) xi = 0b + 1bz2i+error Y Z1 Econ 120C Can we test exogeneity? When we have exact identification (i.e. number of instruments equals number of endogenous regressors) the answer is no. This is related to the reason we cannot test the correlation of X and u. The correlation between Z and the estimated residuals is zero, i.e. What happens if Z is correlated with u ? IV Checking Instrument Exogeneity Xc u These two FS regressions will lead to two 2SLS estimates. If both instruments are exogenous, then these two estimates are expected to be close to each other as both are consistent. If these two estimates are far apart from each other, then it would be reasonable to believe one or both IVs are not exogenous Xd Y 39 40 10 Testing Over-identifying Restrictions Econ 120C Econ 120C If we have multiple instruments, it is possible to test the over-identifying restrictions to see if some of the instruments are correlated with the error Remember: we can test for the exogeneity ONLY IF the model is over-identified. If the model is exactly identified, we must rely on ex-ante expert reasoning. 41 The J Test The J Test Econ 120C Exogeneity of instruments means that they are uncorrelated with u. This suggests that 2SLS residual ui2SLS should be approximately uncorrelated with the instruments. In a regression: ui2SLS = 0 + 1z1i ++ mzmi +m+1w1i ++m+rwri +ei F-statistic for H0: 1=2=...=m=0 J=m*F follows chi-square distribution with (m-k) degree of freedom 42 Econ 120C The test J=mF has a distribution This is chi-squared with df=m-k We reject for large values. Test for OVERIDENTIFICATION. We require that m>k otherwise J=0 always. 43 11
