Unformatted text preview: Measurement Error and IV
Edmund Wright (based on Wooldridge 9.3 & 15.3)
February 2, 2015
An explanatory variable x is endogenous if cov(x, u) 6= 0. Measurement error
can be a source of endogeneity. We will focus on measurement error today, and
how it can be mitigated using instrumental variables.
1 How does measurement error lead to endogeneity? Suppose the model y = β0 + β1 x∗1 + u, (1) satises the rst four Gauss-Markov assumptions. We know from last term
that if we were to estimate this equation using OLS we would get unbiased and
consistent estimators for β0 and β1 .
But suppose while we can observe y , we can't actually observe x∗1 . And so
we can't regress y on x∗1 . What we can observe is a measure of x∗1 , which we
can call x1 , which contains random error. So
x1 = x∗1 + e1 , where e1 is the measurement error. An example of this situation might be
where the variable of interest, x∗1 , is actual income, but what we can observe,
x1 , is reported income, which will not always be accurate.
So we can't observe x∗1 , but we can observe x1 . Suppose we replace x∗1 with
x1 in the regression equation, and estimate the parameters by running an OLS
regression of y on x1 . That is, we estimate
y = β0 + β1 x1 + v. Will our results be unbiased and consistent? Note that given (1), the error term
in this equation we are estimating is equal to
v = u − β1 e 1 . Whether our results are unbiased and consistent depends on whether this is
correlated with x1 . This question depends, of course, on what assumptions we
make. The usual assumptions are these:
1 • E [e1 ] = 0.
• Cov(x1 , u) = 0. (Which, given that (1) satisfying the G-M assumptions
implies x∗1 is uncorrelated with u, means we need that e1 is uncorrelated
• Cov(x∗1 , e1 ) = 0. The variable of interest x∗1 is uncorrelated with the mea- surement error. This is known as the
assumption. classical errors-in-variables (CEV) Under the CEV assumption, we have that
Cov(x1 , e1 ) = Cov(x∗1 + e1 , e1 ) = Cov(e1 , e1 ) = σe21 , where σe21 is the variance of the measurement error. And so the covariance
between v and the observed variable x1 is
Cov(x1 , v) = Cov(x1 , u − β1 e1 ) = Cov(x1 , −β1 e1 ) = −β1 Cov(x1 , e1 ) = −β1 σe21 . And so as long as the measurement error has positive variance, we have an
endogenity problem: our explanatory variable is correlated with the error term
in our estimated equation, which of course means our OLS estimators will be
biased and inconsistent.
2 What kind of inconsistency? Attenuation bias. Recall from last term that the probability limit of the estimator for β1 is (plugging in the variables for this case)
plim(βˆ1 ) = β1 + Cov(x1 , v)
V ar(x1 ) which given that Cov(x1 , v) = −β1 σe21 and Cov(x∗1 , e1 ) = 0 is equal to
plim(βˆ1 ) =
= −β1 σe21
V ar(x∗1 + e1 )
β1 + 2
σx∗ + σe21
β1 + 1 = β1 (1 − σe21
σx2∗ + σe21
1 = β1 σx2∗
1 σx2∗ + σe21 . 1 The term multiplying β1 is less than 1, and so the probability limit of the
estimator will always be closer to zero than the true parameter. This is called
attenuation bias (note: to 'attenuate' is to reduce), and in the case of a positive
2 parameter it means the parameter will be underestimated. Note also that if
the variance of the measurement error is large compared to the variance of the
variable of interest, the bias will be large, and if it is small compared to the
variance of the variable of interest, the bias will be small.
3 Mitigating with IV Measurement error leads to endogeneity, and IV is common strategy for dealing
with endogeneity. How in this case?
For an instrumental variable to be useful, it must be correlated with x1 , but
uncorrelated with the error term v . As in this case v = u − β1 e1 , the IV should
as usual be uncorrelated with the error term u from the original model (1), but
also uncorrelated with the measurement error e1 .
One possibility for an instrumental variable is a second measurement of x∗1 ,
say z1 = x∗1 + a1 , where a1 is the measurement error for this measurement. If
the measurement errors e1 and a1 are uncorrelated - that is, both x1 and z1 mismeasure x∗1 , but their errors are uncorrelated - then z1 satises the requirements
for a good instrument. An example of where this might be possible is where we
have household income reported separately by two people from the household.
An alternative is to just use some other other exogenous variable as an IV.
As long as this variable is not correlated with the measurement error (and as
usual is not correlated with y after controlling for x1 ) then it may be used as
4 Example exam question: Summer 2014, A2 b-d (b) The model is P
LSic = αc + βIic
+ ic , but the equation that is estimated is
LSic = αc + βIic + vic , where Iic = IicP + IicT , and so vic = ic − βIicT . We can think of IicT as the
measurement error in this question. For the OLS estimator to be consistent
we would need Cov(I, v) = 0, but, if the model satises the Gauss-Markov
assumptions, and additionally cov(I, ) = 0, cov(I p , I T ) = 0, in fact in this case
Cov(I, v) = Cov(I, − βI T )
= Cov(I, −βI T )
= Cov(I p + I T , −βI T )
= Cov(I T , −βI T )
= −βσI2T which, if I T has positive variance, is not equal to zero.
3 (c) (Not necessary to answer this question, but: The reason this might work is that
education is unlikely to be correlated with the measurement error, in this case
transitory income, but likely to be correlated with the variable of interest, in
this case permanent income, and so with total income.)
Recall that the instrument educ is relevant if cov(educ, I) 6= 0. Recall that
the two-stage-least-squares method of instrumental variable regression involves
doing a rst stage regression of Iic on educi , that is estimating the equation
Iic = π0 + π1 educi + ui . The coecient on educi in this equation, π1 , is equal to cov(educi , Iic )/var(educi ),
and so to test whether cov(educi , Iic ) 6= 0 we can do a t-test of the null hypothesis H0 : π1 = 0 (irrelevant) against the alternative hypothesis H1 : π1 6= 0
(relevant). Rule of thumb (you should try √and remember this) is that we have
suciently relevant instrument if t-stat > 10.
(d) While educi is perhaps unlikely to be correlated with the measurement error,
it suers from the problem of being likely to aect the dependent variable, life
satisfaction in other ways, that are not through income. And so might well be
correlated with the error term, which would mean it is not a valid instrument. 4 ...
View Full Document