ECON 418 Introduction to Econometrics Anna Breman Univeristy of Arizona Part 3: Regression analysis Endogeneity Anna Breman (Univeristy of Arizona) ECON 418 Fall 2008 1 / 32

Endogeneity I Endogeneity occurs whenever the error term, u , is correlated with the explanatory variable x j . Why is endogeneity a problem? We want to estimate a causal relationship between the independent variables and the dependent variable Endogeneity makes it di¢ cult to infer causality between y and x that is really due to other unobserved factors that a/ect y and also happen to be correlated with x It results in biased estimates Anna Breman (Univeristy of Arizona) ECON 418 Fall 2008 2 ± 32
What can cause endogeneity? 1 Omitted variables (Ch 3.3, 9) 2 Measurement Error (Chapter 9) 3 Simultaneity: the independent and dependent variables are jointly determined (Chapter 16) Anna Breman (Univeristy of Arizona) ECON 418 Fall 2008 3 / 32

Simple Linear Regression. Assumption 4 : Zero Conditional Mean The error u has an expected value of zero given any value of the explanatory variable. E ( u j x ) = 0 Multiple Linear Regression. Assumption 4 : Zero Conditional Mean The error u has an expected value of zero given any value of the explanatory variables. E ( u j x 1 , x 2 , ..., x k ) = 0 If SLR4 and MLR4 hold, we say that we have exogenous explanatory variables, and if it does not the variables are endogenous . If any explanatory variable x j is correlated with u for any reason, then x j is said to be endogenous. Anna Breman (Univeristy of Arizona) ECON 418 Fall 2008 4 / 32
Part 1: Omitted Variable Bias Omitting a variable that actually belongs in the true population model, also known as Excluding a relevant variable, or Underspecifying the model Anna Breman (Univeristy of Arizona) ECON 418 Fall 2008 5 / 32

Deriving the omitted variable bias, E ( ˜ β 1 ) We start with a true population model with two explanatory variables y = β 0 + β 1 x 1 + β 2 x 2 + u MLR4. β 1 is the variable of primary interest. You can think of y as hourly wage, x 1 as education and x 2 as innate ability. wage = β 0 + β 1 educ + β 2 abil + u Anna Breman (Univeristy of Arizona) ECON 418 Fall 2008 6 / 32
Due to (for example) lack of data, the equation is estimated without x 2 , wage = β 0 + β 1 educ + v where v = β 2 abil + u We obtain the following estimated regression line: ˜ y = ˜ β 0 + ˜ β 1 x 1 Anna Breman (Univeristy of Arizona) ECON 418 Fall 2008 7 / 32

Deriving the omitted variable bias, E ( ˜ β 1 ) If we could regress y on x 1 and x 2 , we would obtain ˆ β 1 and ˆ β 2 . If we could regress
