11
5.
Multicollinearity
Multicollinearity is the phenomenon that (some of) the explanatory variables are highly
correlated. The effect of multicollinearity is that the t-values are deflated. To demonstrate this, I
have generated artificial data
for
j
= 1,
....
,
n
= 500 as follows. The explanatory
Y
j
,
X
1,
j
,
X
2,
j
variables
have been drawn independently from the
N
(0,1) distribution. Next, I have drawn
X
1,
j
random variables
independently from the
N
(0,1) distribution, and have set
=
+
V
j
X
2,
j
X
1,
j
Due to this construction the explanatory variables
are highly correlated. In
0.01.
V
j
.
X
1,
j
and
X
2,
j
particular, the
R
2
of the regression of
is 0.999892.
Finally, I have drawn the errors
X
2,
j
on
X
1,
j
independently from the
N
(0,1) distribution, and have generated the dependent variables by
U
j
Y
j
'
X
1,
j
%
X
2,
j
%
1
%
U
j
,
j
'
1,...,
n
'
500.
(36)
This is model (7) with
k
= 3 and
β
1
'
β
2
'
β
3
'
1.
The EasyReg output involved is below.
OLS estimation results
Parameters
Estimate
t-value
H.C. t-value
(S.E.)
(H.C. S.E.)
[p-value]
[H.C. p-value]
b(1)
0.45258
0.101
0.104
(4.48811)
(4.35703)
[0.91968]
[0.91727]
b(2)
1.57385
0.351
0.362
(4.48634)
(4.35350)
[0.72573]
[0.71772]
b(3)
0.95879
21.386
21.410
(0.04483)
(0.04478)
[0.00000]
[0.00000]
Notes:
1: S.E. = Standard error
2: H.C. = Heteroskedasticity Consistent. These t-values and
standard errors are based on White's heteroskedasticity
consistent variance matrix.
3: The two-sided p-values are based on the normal approximation.