Multicollinearity—Page 1
Multicollinearity
What multicollinearity is.
Let H = the set of all the X (independent) variables. Let G
k
= the
set of all the X variables
except
X
k
. The formula for standard errors is then
k
k
k
k
k
k
X
y
YH
k
X
y
k
YH
X
y
G
X
YH
b
s
s
K
N
R
Vif
s
s
K
N
Tol
R
s
s
K
N
R
R
s
*
)
1
(
1
*
*
)
1
(
*
1
*
)
1
(
*
)
1
(
1
2
2
2
2
Questions
: What happens to the standard errors as R
2
YH
increases? As N increases? As K
increases? As the multiple correlation between one DV and the others increases?
From the above formulas, it is apparent that
The bigger R
2
YH
is, the smaller the standard error will be.
The bigger R
2
XkGk
is (i.e. the more highly correlated X
k
is with the other IVs in the model),
the bigger the standard error will be. Indeed, if X
k
is perfectly correlated with the other IVs,
the standard error will equal infinity. This is referred to as the problem of
multicollinearity
.
The problem is that, as the Xs become more highly correlated, it becomes more and more
difficult to determine which X is actually producing the effect on Y.
Also, 1  R
2
XkGk
is referred to as the
Tolerance
of X
k
. A tolerance close to 1 means there is
little multicollinearity, whereas a value close to 0 suggests that multicollinearity may be a
threat. The reciprocal of the tolerance is known as the
Variance Inflation Factor (VIF)
. The
VIF shows us how much the variance of the coefficient estimate is being inflated by
multicollinearity. For example, if the VIF for a variable were 9, its standard error would be
three times as large as it would be if its VIF was 1. In such a case, the coefficient would have
to be 3 times as large to be statistically significant.
Larger sample sizes decrease standard errors (because the denominator gets bigger). This
reflects the fact that larger samples will produce more precise estimates of regression
coefficients.
Adding more variables to the equation can increase the size of standard errors, especially if
the extra variables do not produce increases in R
2
. Adding more variables decreases the (N 
K  1) part of the denominator. More variables can also decrease the tolerance of the variable
and hence increase the standard error. In short, adding extraneous variables to a model tends
to reduce the precision of all your estimates.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentMulticollinearity—Page 2
Causes of multicollinearity
Improper use of dummy variables (e.g. failure to exclude one category)
Including a variable that is computed from other variables in the equation (e.g. family income
= husband’s income + wife’s income, and the regression includes all 3 income measures)
In effect, including the same or almost the same variable twice (height in feet and height in
inches; or, more commonly, two different operationalizations of the same identical concept)
The above all imply some sort of error on the researcher’s part. But, it may just be that
variables really and truly are highly correlated.
Consequences of multicollinearity
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '11
 RichardWilliams
 Regression Analysis, multicollinearity, condition number

Click to edit the document details