Let’s look into the nature of the problem that multicollinearity entails. Suppose that Z = a + b × X for some known constants a and b. Then, Y = β 0 + β 1 X + β 2 Z + u = β 0 + β 1 X + β 2 ( a + bX ) + u = β 0 + β 2 a + ( β 1 + bβ 2 ) X + u. Note that Condition 1M implies E [ u | X ] = 0 . Therefore, in this case, we can reduce the multiple regression model to a univarite regression model by redefining the coeffcients: Y = γ 0 + γ 1 X + u, where γ 0 = β 0 + β 2 a and γ 1 = β 1 + bβ 2 .
36 4. MULTIPLE LINEAR REGRESSION MODELS Therefore, using the previous results, we can identify γ 0 and γ 1 as follows γ 1 = Cov( X, Y ) Var( X ) and γ 0 = E Y - Cov( X, Y ) Var( X ) E X. Can we write the values of β 0 , β 1 and β 2 from this information? The answer is no, because it is like trying to solve for three unknowns from two equations: Cov( X, Y ) Var( X ) = β 0 + β 2 a and E Y - Cov( X, Y ) Var( X ) E X = β 1 + bβ 2 . (Recall that we have already assumed that we know the constants a and b. ) Actually the solutions for β 0 , β 1 , and β 2 from these equations are infinitely many. Therefore, the coefficients are not identified. We saw that the inclusion of an additional regressor Z enables us to interpret the pa- rameter β 1 as a quantified effect of the change of X upon a typical value of Y conditional on that Z is kept constant. When X and Z are perfectly correlated, there is no ”indepen- dent” variation of Z that can be distinguishable from the variation of X . Therefore, we do not have any hope of learning about the effect of X as separated from that through Z . Therefore, multicollinearity arises because we try to do something that is mathematically impossible. In other words, multicollinearity names the problem of trying to separate out the effect of one regressor from the other when the other regressor is completely redundant , meaning that it does not have a role distinct from X in identifying the causal effect. This results in the failure of the identification of the coefficients. For all these nice discussions, it rarely happens that one regressor is precisely equal to a linear combination of the other regressors. However, when multicollinearity arises only nearly, meaning that one regressor is nearly equal to a linear combination of the other regressors, the coefficients nearly fail to be identified. When the coefficients are close to being nonidentified, the asymptotic theory (meaning, consistency, asymptotic normality, standard errors based on asymptotic normality) does not provide a reliable approximation of the finite sample distribution of the coefficient estimators. Therefore, in the case of near multicollinearity, the standard inference based on the asymptotic theory becomes dubious. In many situations, near multicollinearity can be detected by looking at the standard errors. When the standard errors are ridiculously huge, one may first suspect whether there is a redundancy of regressors. In that case, one can simply remove the redundant regressor from the regression, because one does not have to control for a variable which is merely redundant.
You've reached the end of your free preview.
Want to read all 87 pages?