22_MultiCol_handout

22_MultiCol_handout - Multicollinearity 73-261 Econometrics...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Multicollinearity 73-261 Econometrics October 13 Reading: Wooldridge 3.4, Gujarati 10 Overview Topic 2: Assumptions and violations a.k.a. Problems and Fixes Problem: Multicollinearity A3 is not violated, but we are close to it X’s are closely related Regression cannot separate their effects Inaccurate estimates, unclear significance Fix: no easy way More data (advice from theorists) Drop variables, transformations 73-261 2.2 Multicollinearity p. 2 © CMU / Y. Kryukov Assumption A3 Violated if: γ0 + γ1Xi,1 + γ2Xi,2 + … + γkXi,k ≡ 0, ∀ i some variable = linear function of others Example: dummy trap (male + female = 1) Let the true equation be wage = β0 + δ0 male + δ1 female + . . . Pick any real number α Then it is also true that wage = (β0 − α) + (δ0 + α)male + (δ1 + α) female + . . . Can have any coef. on male (with the right α) OLS cannot decide which one to report 73-261 2.2 Multicollinearity p. 3 © CMU / Y. Kryukov 2 variables, 3 observations 1 Y 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 X1 0.8 1 0 0.2 0.4 0.6 0.8 1 X2 Can we fit a flat surface to these points? 73-261 2.2 Multicollinearity p. 4 © CMU / Y. Kryukov Yes, we can 1 Y 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 X1 73-261 2.2 Multicollinearity 0.8 1 0 0.2 0.4 0.6 1 0.8 X2 p. 5 © CMU / Y. Kryukov Another 3 observations 1 Y 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 X1 0.8 1 0 0.2 0.4 0.6 0.8 1 X2 Can you fit a surface to these points? 73-261 2.2 Multicollinearity p. 6 © CMU / Y. Kryukov Does this one fit? 1 Y 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 X1 0.8 73-261 2.2 Multicollinearity 1 0 0.2 0.4 0.6 0.8 1 X2 p. 7 © CMU / Y. Kryukov But so does this one: 1 0.8 0.6 Y 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 73-261 2.2 Multicollinearity 0 0.2 0.4 0.6 0.8 1 p. 8 © CMU / Y. Kryukov Multicollinearity Near-violation of A3 Xi,k = γ0 + γ1Xi,1 + γ2Xi,2 + … + γk−1Xi,k−1+ εi εi small = variables are . . . . . . . . . A3 is satisfied, so . . . Unbiasedness, consistency, BLUE – still there Theorists dismiss MC as “data problem” But estimates are . . . . . . . . . . . . . . . . . . : Small change in data can change the estimates a lot Standard errors are increased to reflect that Significance is . . . . . . . . . . . . 73-261 2.2 Multicollinearity p. 9 © CMU / Y. Kryukov Example – MLB salary data Example from F-test lecture (y = log salary) Variable Estimate Const., years, games/yr not shown Batting Average 0.00098 (0.00110) Home runs /yr 0.0144 (0.0161) Runs batted in /yr 0.01077 (0.00718) R2 0.6278 All three variables shown are . . . . . . . . . We suspect they’re . . . . . . . . . . . . Let’s try them one at a time 73-261 2.2 Multicollinearity p. 10 © CMU / Y. Kryukov MLB example continued Variable Const., years, games/yr (I) (II) not shown not shown (III) (IV) not shown not shown Batting Average 0.00098 (0.00110) 0.00138 (0.00110) - - Home runs /yr 0.0144 (0.0161) - 0.0359 (0.0073) - Runs batted in /yr 0.01077 (0.00718) - - 0.0168 (0.0032) 0.5989 0.6235 0.6264 R2 0.6278 . . . . . . . . . . . . . . . . . . . . . is indeed irrelevant . . . . . . . . . . . . and . . . . . . . . . . . . . . . . . . . . . aren’t Variable is significant when used alone R2 did not decrease when we drop other var’s 73-261 2.2 Multicollinearity p. 11 © CMU / Y. Kryukov Consequences of MC Bad: Large standard errors (only for collinear variables) . . . . . . . . . . . . . . . . . . significance . . . . . . . . . . . . . . . . . . coefficients estimates Wider confidence intervals Not affected: Goodness of fit: R2, F-stat remain the same Non-collinear variables are OK coefficients and s.e.’s are valid Prediction works fine 73-261 2.2 Multicollinearity p. 12 © CMU / Y. Kryukov Factors of the standard error It can be shown that: [ ˆˆ ˆ V( β j ) = σ 2 ( X ' X ) −1 = ˆ σ2 Σ( X ij − X j ) (1 − R ) ˆ σ is the s.e. of u, same for all j jj 2 2 j Better fit . . . . . . . . . . . . Σ( X ij − X j ) 2 is TSS for Xj, High variation in Xj = effect more clear = s.e. . . . R = R2 from regressing Xj on all other X‘s: 2 j Xj = γ0 + γ1X1 + … + γj−1Xj−1 + γj+1Xj+1 + … + γkXk+ ε Xj closely related to others = . . . . . . . . . . . . 73-261 2.2 Multicollinearity p. 13 © CMU / Y. Kryukov Detecting multicollinearity Look for groups of variables that are: Insignificant (individually) Potentially related to each other Check their correlation corr(Home runs, Runs batted in) = 0.89 correlation describes only pairs of variables Compute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . : 1 VIFj = (1 − R 2 ) j Rule of thumb: VIF > 10 ⇒ multicollinearity VIF(Home runs) = . . . . . ; VIF(RBIs) = . . . 73-261 2.2 Multicollinearity p. 14 © CMU / Y. Kryukov Fixing Multicollinearity More data (= more variation in X’s) Panel: many . . . . . . . . . , many . . . . . . . . . . . . . . . Drop collinear variable(s) Leads to bias, but reduces variance Aggregate the variables Total expenditure instead of categories Build an . . . . . . . . . . . . : z := 4*HRunsYr + RBIsYr Transformation Ratio (. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ) Time series: Difference with previous period 73-261 2.2 Multicollinearity p. 15 © CMU / Y. Kryukov Other things to do about MC Experiment with excluding variables keep an eye on significance of others and R2 Do F-tests for groups of variables Cannot separate effects of collinear var’s It’s OK to say that in your report Can test sensitivity to data by running on sub-samples Gujarati Chapter 10 mentions other methods 73-261 2.2 Multicollinearity p. 16 © CMU / Y. Kryukov ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online