This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 Econ 513 Fall 2005 USC, Department of Economics Lecture 6 Ordinary Least Squares: Omitted Variable Bias and Proxy Vari ables A. Omitted Variable Bias Often we estimate a linear regression function but we are not completely sure that we have included all the relevant regressors. Here we investigate how omitting a variable affects the coefficients on the regressors we are most interested in. Suppose the true regression function is y i = β + β 1 · x i 1 + . . . + β K · x iK + β z · z i + ε i . We refer to this as the “long regression.” Now suppose we estimate the regression function without z i : y i = γ + γ 1 · x i 1 + . . . + γ K · x iK + η i , referred to as the “short regression.” This regression is largely definitional: the coefficients are defined to be γ = ( E [ xx ]) 1 ( E [ xy ]). In addition it is useful to consider the “artificial regression” of the omitted z i on a constant and the x ik : z i = δ + δ 1 · x i 1 + . . . + δ K · x iK + ν i . Again this is definitional, choose δ = ( E [ xx ]) 1 ( E [ xz ]) so that ν = z x δ is by definition uncorrelated with x . 2 If we estimate the short regression, and focus on the k th regressor, we will estimate γ k . We are interested in the relation between γ k and β k . To see what this will look like, consider the long regression, and substitute in for the omitted z i : y i = β + β 1 · x i 1 + . . . + β K · x iK + β z · z i + ε i = β + β 1 · x i 1 + . . . + β K · x iK + β z · ( δ + δ 1 · x i 1 + . . . + δ K · x iK + ν ) + ε i = ( β + β z · δ ) + ( β 1 + β z · δ 1 ) · x i 1 + . . . + ( β K + β z · δ K ) · x iK + ( β z · ν + ε i ) . Since the composite error term β z · ν + ε is uncorrelated with the x s by definition, the regression coefficients in this representation are what you get from the short regression. So, γ k = β k + β z · δ k , or the omitted variable bias (the difference between the coefficient in the short regression, γ k , and the coefficient in the long regression), β k , is equal to the product of the coefficient on the omitted variable, β z , and the coefficient on the included regressor x ik in a regression of the omitted variable on all included regressors, δ k . These equalities also hold exactly for the estimated regression coefficients, or ˆ γ k = ˆ β k + ˆ β z · ˆ δ k . 3 The practical relevance of this may seem small. In practice we do not observe the omitted variable, and so we cannot estimate these regression coefficients. If we could, we would not have the bias! Nevertheless, this result is extremely useful. Let us see how this is used in practice. Consider a wage regression of log earnings on education of the type we looked at before: log(earnings) i = 5 . 0455 + 0 . 0667 · educ i ....
View
Full
Document
This note was uploaded on 07/22/2008 for the course ECON 513 taught by Professor Rashidian during the Fall '07 term at USC.
 Fall '07
 Rashidian
 Economics, Econometrics

Click to edit the document details