l11 - Multicollinearity What multicollinearity is Let H =...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Multicollinearity—Page 1 Multicollinearity What multicollinearity is. Let H = the set of all the X (independent) variables. Let G k = the set of all the X variables except X k . The formula for standard errors is then k k k k k k X y YH k X y k YH X y G X YH b s s K N R Vif s s K N Tol R s s K N R R s * ) 1 ( 1 * * ) 1 ( * 1 * ) 1 ( * ) 1 ( 1 2 2 2 2 Questions : What happens to the standard errors as R 2 YH increases? As N increases? As K increases? As the multiple correlation between one DV and the others increases? From the above formulas, it is apparent that The bigger R 2 YH is, the smaller the standard error will be. The bigger R 2 XkGk is (i.e. the more highly correlated X k is with the other IVs in the model), the bigger the standard error will be. Indeed, if X k is perfectly correlated with the other IVs, the standard error will equal infinity. This is referred to as the problem of multicollinearity . The problem is that, as the Xs become more highly correlated, it becomes more and more difficult to determine which X is actually producing the effect on Y. Also, 1 - R 2 XkGk is referred to as the Tolerance of X k . A tolerance close to 1 means there is little multicollinearity, whereas a value close to 0 suggests that multicollinearity may be a threat. The reciprocal of the tolerance is known as the Variance Inflation Factor (VIF) . The VIF shows us how much the variance of the coefficient estimate is being inflated by multicollinearity. For example, if the VIF for a variable were 9, its standard error would be three times as large as it would be if its VIF was 1. In such a case, the coefficient would have to be 3 times as large to be statistically significant. Larger sample sizes decrease standard errors (because the denominator gets bigger). This reflects the fact that larger samples will produce more precise estimates of regression coefficients. Adding more variables to the equation can increase the size of standard errors, especially if the extra variables do not produce increases in R 2 . Adding more variables decreases the (N - K - 1) part of the denominator. More variables can also decrease the tolerance of the variable and hence increase the standard error. In short, adding extraneous variables to a model tends to reduce the precision of all your estimates.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Multicollinearity—Page 2 Causes of multicollinearity Improper use of dummy variables (e.g. failure to exclude one category) Including a variable that is computed from other variables in the equation (e.g. family income = husband’s income + wife’s income, and the regression includes all 3 income measures) In effect, including the same or almost the same variable twice (height in feet and height in inches; or, more commonly, two different operationalizations of the same identical concept) The above all imply some sort of error on the researcher’s part. But, it may just be that variables really and truly are highly correlated. Consequences of multicollinearity
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 12

l11 - Multicollinearity What multicollinearity is Let H =...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online