Let’s look into the nature of the problem that multicollinearity entails. Suppose that
Z
=
a
+
b
×
X
for some
known
constants
a
and
b.
Then,
Y
=
β
0
+
β
1
X
+
β
2
Z
+
u
=
β
0
+
β
1
X
+
β
2
(
a
+
bX
) +
u
=
β
0
+
β
2
a
+ (
β
1
+
bβ
2
)
X
+
u.
Note that Condition 1M implies
E
[
u

X
] = 0
.
Therefore, in this case, we can reduce the
multiple regression model to a univarite regression model by redefining the coeffcients:
Y
=
γ
0
+
γ
1
X
+
u,
where
γ
0
=
β
0
+
β
2
a
and
γ
1
=
β
1
+
bβ
2
.
36
4. MULTIPLE LINEAR REGRESSION MODELS
Therefore, using the previous results, we can identify
γ
0
and
γ
1
as follows
γ
1
=
Cov(
X, Y
)
Var(
X
)
and
γ
0
=
E
Y

Cov(
X, Y
)
Var(
X
)
E
X.
Can we write the values of
β
0
, β
1
and
β
2
from this information? The answer is no, because
it is like trying to solve for three unknowns from two equations:
Cov(
X, Y
)
Var(
X
)
=
β
0
+
β
2
a
and
E
Y

Cov(
X, Y
)
Var(
X
)
E
X
=
β
1
+
bβ
2
.
(Recall that we have already assumed that we know the constants
a
and
b.
) Actually
the solutions for
β
0
, β
1
,
and
β
2
from these equations are infinitely many. Therefore, the
coefficients are not identified.
We saw that the inclusion of an additional regressor
Z
enables us to interpret the pa
rameter
β
1
as a quantified effect of the change of
X
upon a typical value of
Y
conditional
on that
Z
is kept constant. When
X
and
Z
are perfectly correlated, there is no ”indepen
dent” variation of
Z
that can be distinguishable from the variation of
X
. Therefore, we
do not have any hope of learning about the effect of
X
as separated from that through
Z
.
Therefore, multicollinearity arises because we try to do something that is mathematically
impossible. In other words, multicollinearity names the problem of trying to separate out
the effect of one regressor from the other when the other regressor is completely
redundant
,
meaning that it does not have a role distinct from
X
in identifying the causal effect. This
results in the failure of the identification of the coefficients.
For all these nice discussions, it rarely happens that one regressor is precisely equal to
a linear combination of the other regressors. However, when multicollinearity arises only
nearly,
meaning that one regressor is nearly equal to a linear combination of the other
regressors, the coefficients
nearly
fail to be identified. When the coefficients are close to
being nonidentified, the asymptotic theory (meaning, consistency, asymptotic normality,
standard errors based on asymptotic normality) does not provide a reliable approximation
of the finite sample distribution of the coefficient estimators. Therefore, in the case of near
multicollinearity, the standard inference based on the asymptotic theory becomes dubious.
In many situations, near multicollinearity can be detected by looking at the standard errors.
When the standard errors are ridiculously huge, one may first suspect whether there is a
redundancy of regressors.
In that case, one can simply remove the redundant regressor
from the regression, because one does not have to control for a variable which is merely
redundant.
You've reached the end of your free preview.
Want to read all 87 pages?
 Spring '12
 SUN
 Economics, Econometrics, Least Squares, Linear Regression, Regression Analysis, Variance, Yi