4_multicollinearity - Research Methods in Economics ECO...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Research Methods in Economics ECO 4451 2 Introduction There might be consequences to adding variables to a model "Multicollinearity" is high correlation involving two or more IVs 1. One IV influences another IV 2. Two or more IVs influence (or influenced by) a third IV Multicollinearity Consequences 1. Can drastically alter results from one model to another 2. Can cause problems in interpreting results Slide # 3 Slide # 4 Do Market Size and Wins Affect NBA Teams' Profits? Profit 33.4 22.0 16.0 8.7 5.4 4.7 -1.5 15 -2.1 -4.0 Market Size 3 3 3 2 2 2 1 1 1 Wins 57 63 46 42 44 55 35 13 28 Example PROFIT = 0 + 1MKTSIZE +2WINS+ Where PROFIT is annual team profit ($1M) PROFIT i l fi ( M) MKTSIZE = 3 for large, 2 for medium, 1 for small WINS is no. of wins in one season NOTE: NBA market size = 3 for large, 2 for medium, 1 for small NOTE: Don't use this approach for measuring market size. Use a better measure like population. Profit is in millions of $. See regression output for multicollinearity. Slide # 5 Slide # 6 1 Example (cont.) Variables A B C Constant 17.16 16.74 17.06 (0.009) (0.10) (0.03) MKTSIZE 13.17 13.28 (0.0006) (0.02) ( 6) ( ) WINS 0.61 .008 (0.02) (0.98) Examine t-stat p-values Models A & B vs. model C Variables A B C Constant 17.16 16.74 17.06 (0.009) (0.10) (0.03) MKTSIZE 13.17 13.28 (0.0006) (0.02) ( 6) ( ) WINS 0.61 .008 (0.02) (0.98) (p-values in parentheses) Slide # 7 (p-values in parentheses) Slide # 8 Examine coefficient values Models A & B vs. model C Variables A B C Constant 17.16 16.74 17.06 (0.009) (0.10) (0.03) MKTSIZE 13.17 13.28 (0.0006) (0.02) ( 6) ( ) WINS 0.61 .008 (0.02) (0.98) Example (cont.) Point 1. Observed changes across models due to high correlation/multicollinearity between WINS & MKTSIZE correlation = 0.835 (p-values in parentheses) Slide # 9 Slide # 10 Note 1. High correlation among IVs BAD 2. High correlation between IV, DV GOOD Exact Multicollinearity A. Definition 1. Two or more IVs perfectly (or nearly perfectly) correlated 2. What would be values of correlation coefficients? 2 What would be values of correlation coefficients? Slide # 11 Slide # 12 2 Exact Multicollinearity (cont.) B. Examples 1. POP, GDP in model explaining housing starts Correlation = 0.99 Exact Multicollinearity (cont.) NO F-STATISTIC!!! R2 & adj. R2 are NEGATIVE!!! +------------------------------------------------------------------------------------+ | Ordinary least squares regression | | Dep. var. = PROFIT Mean= 9.17778 , S.D.= 12.48536922 | | Model size: Observations = 9, Parameters = 3, Constant -22.82222222 ........(Fixed Parameter). ....... Deg.Fr.= 6 | WINS 1.01E+15 7.99E+14 1.259 0.2546 |W I N _ P C T R-squared= -.258097, Adjusted R-squared = Fit: -8.25E+16 6.55E+16 -1.259 0.2546 -.67746 | +-------------------------------------------------------------------------------------+ Correlation between WINS & WIN_PCT = 1.00 Slide # 13 Slide # 14 2. What determines profits? 2 What determines profits? suppose use two independent variables WINS WIN_PCT = winning percentage see next slide Exact Multicollinearity (cont.) C. Consequences 1. Software might 1. not calculate coefficient estimates of one or more collinear IVs OR 2. not calculate ANY coefficient estimates "singular matrix" message some software "regressors are collinear" message others OR 3. not report Fstat and show R2 < 0! Slide # 15 Slide # 16 Error #911: Independent variables are collinear. Model is unstable. (You idiot! What were you thinking? How could you do something that stupid? . . . ) Exact Multicollinearity (cont.) D. Detection 1. Same as consequences E. Solutions 1. Drop one or more perfectly collinear IVs f l ll WINS vs. WIN_PCT Slide # 17 Slide # 18 3 Near Multicollinearity A. More common than perfect multicollinearity B. Usually simply "multicollinearity" C. Definition 1. Close, but not exact, relationship exists 2. correlation values farther from +1 & 1 Near Multicollinearity (cont.) D. Consequences 1. (Suppose X2 & X3 collinear & both belong in model) g 2. standard errors of 2 &/or 3 large 3. Possible that tstatistic pvalues > 5% on estimates of 2 &/or 3 WINS vs. MKTSIZE WHAT CONCLUSION ABOUT X2 & X3 WOULD YOU DRAW FROM P-VALUES?? WOULD YOU BE CORRECT? Slide # 19 Slide # 20 Near Multicollinearity (cont.) D. Consequences (cont.) 4. all estimates efficient, consistent 5 2 doesn't show effect on Y of ONLY X2 changing g g 5. WHY? WINS vs. MKTSIZE Look at Regression #4 on handout Near Multicollinearity E. Detection 1. Estimation results a) ) a) a) b) High adj. R2 (& high F) with all low tstatistics g j ( g ) Usually .8 (absolute value) or higher When IVs dropped or added Estimated coefficients change drastically 2. High correlation coefficients 3. Coefficients sensitive to model specification 6. ditto 3 Slide # 21 Slide # 22 Near Multicollinearity (cont.) 4. NOTE: 2 IVs could be a) b) not highly correlated as a pair but 3 or more could be highly correlated as a group Near Multicollinearity (cont.) 6. Auxiliary regressions example a) Main regression model: (1) HOUSING = 1 + 2 INTRATE + 3 POP + 4GDP + b) Auxiliary regressions: A ili i 5. Auxiliary regressions a) b) i) Regress each IV on all other IVs Need to do this for next topic: USE ONLY IVs (1) (2) (3) INTRATE = A1 + A2 POP + A3 GDP + POP = D1 + D2 INTRATE + D3 GDP + GDP = G1 + G2 INTRATE + G3 POP + Variance Inflation Factors (VIF) Example of auxiliary regression follows c) Slide # 23 Slide # 24 4 Near Multicollinearity (cont.) c) After Estimate Auxiliary Regressions Variance Inflation Factors VIF is one simple measure of extent of multicollinearity VIF measured for one IV at a time (ith IV) (1) record R2 for each auxiliary regression (2) use R2 to calculate VIF See Regression #4 on handout Slide # 25 Slide # 26 VIF (cont.) VIF compares two variances for that one IV VIF measures ratio of (1) actual variance of hat for that IV (2) variance of hat in the total absence of ( ) multicollinearity of that IV with any other IVs VIF (cont.) The greater the multicollinearity of that one IV with others, the larger its VIF The greater the multicollinearity, the worse is the precision of its estimate Slide # 27 Slide # 28 VIF (cont.) Formula: VIF (cont.) The greater is the multicollinearity of the ith IV with other IVs, the larger is VIF Actual variance of hat for ith IV is VIF times greater than its variance in absence of multicollinearity VIF = 1/(1 Ri2) where Ri2 is from the auxiliary regression of the ith IV on all other IVs Slide # 29 Slide # 30 5 VIF (cont.) VIF is greater than or equal to 1 Rule: VIF > 10 reveals multicollinearity severe enough to cause a problem** Others: VIF > 30 indicates a problem VIF Example b) (1) Auxiliary regressions: WINS = A1 + A2 MKTSIZE + R2= .70, VIF = 1/(1 .70) VIF = 3.33 no problem (according to the rule VIF bl on previous slide) **Chaterjee & Price, 1991, Regression Analysis by Example, 2nd ed., John Wiley and Sons. Slide # 31 Suppose (2) WINS = A1 + A2 MKTSIZE + R2= .91, VIF = 1/(1 .91) VIF = 11.11 - problem Slide # 32 VIF Example b) (1) Near Multicollinearity (cont.) F. Solutions 1. Judgement required in handing multicollinearity 2. Natural response is to drop one of the collinear IVs a) ) b) c) Example: suppose X2 and X3 collinear E l X d X lli Auxiliary regressions: WINS = A1 + A2 MKTSIZE + R2= .70, VIF = 1/(1 .70) = 3.33 Suppose (2) WINS = A1 + A2 MKTSIZE + R2= .91, VIF = 1/(1 .91) = 11.11 1. What differed between the two regressions to cause the VIF to rise in #2? 2. What does that mean? Slide # 33 Automatically drop X2? NO Slide # 34 Near Multicollinearity (cont.) d) Might bias estimate of 3 (if drop X2) POP, GDP nearly collinear Don't automatically drop POP (1) Bias happens whenever retained and dropped IVs collinear enough that. . . (2) coefficient of retained IV reflects effects on DV of BOTH retained and dropped IVs (3) rather than effect of solely the retained IV (4) SEE NEXT SLIDE FOR EXAMPLE e) Advanced (1) E3hat = 3 + 2 (cov(X2,X3)/var X3) C B 131.75 687.90 (0.27) (0.07) INTRATE 184.75 169.66 (0.03) (0.03) ( ) ( ) POP 14.90 (0.41) GDP 0.52 0.91 (0.54) (0.036) Variables Constant 0.91 measures effects of BOTH POP, GDP Slide # 35 Slide # 36 6 Multicollinearity Solutions (cont.) "... it is less than obvious what one should do upon finding ...a multicollinearity problem."** Note 3. Usual problem a) a) b) c) Truly significant IVs appear to be insignificant Example: suppose X2 and X3 collinear But tstatistic pvalues on both <5% So, neither appears insignificant 4. When multicollinearity not a problem ** LIMDEP, Version 8.0, Econometric Modeling Guide, 2002, Vol. 1, William H. Greene, Econometric Software, p. E518. d) No problem Slide # 37 Slide # 38 Multicollinearity Solution tstat pvalue low ( .05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) Highest Absolute Value of Correlation Coefficient high ( .9) moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP Near Multicollinearity (cont.) 5. One reasonable solution a) Background (1) pvalues for any one specific IV: X2 or X3 or ... y p (2) correlation coefficients from a correlation matrix involve that same IV (X2 or X3 or ...) and any other IV called "rho" (3) The action (DROP or KEEP) refers to the IV X2 or X3 or ... Slide # 39 Slide # 40 Multicollinearity Solution tstat pvalue low ( .05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) Highest Absolute Value of Correlation Coefficient high ( .9) moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP Near Multicollinearity (cont.) b) Example (1) Suppose you are wondering about dropping or keeping IV named EDUC. (2) First look at pvalue for EDUC in your First look at p value for EDUC in your regression (3) Let's say pvalue = 0.12 Slide # 41 Slide # 42 7 Near Multicollinearity (cont.) Correlation Matrix Example YRS YRS GENDER AGE EDUC INCOME 1 0.02805 GENDER 0.02805 AGE 0.13537 EDUC 0.24785 INCOME 0.23341 0.00578 0.44073 0.78575 1 1 -0.11397 -0.01877 1 0.38271 0.44073 0.38271 1 0.78575 (4) Look at every correlation coefficient involving EDUC (a) EDUC vs. YRS (b) EDUC vs. GENDER, etc. YRS YRS GENDER AGE EDUC INCOME 1 0.02805 GENDER 0.02805 AGE 0.13537 EDUC 0.24785 INCOME 0.23341 0.00578 0.44073 0.78575 1 0.13537 -0.11397 0.24785 -0.01877 0.23341 0.00578 1 -0.11397 -0.01877 1 0.38271 0.44073 0.38271 1 0.78575 0.13537 -0.11397 0.24785 -0.01877 0.23341 0.00578 Slide # 43 Slide # 44 Near Multicollinearity (cont.) (5) Note highest absolute value Multicollinearity Solution P-value = 0.12 tstat pvalue low ( .05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) = .79 Highest Absolute Value of Correlation Coefficient high ( .9) g moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP (6) Let's say it is .79, which is absolute value of .79 (7) Find line in table corresponding to 0.12 and .79 (8) What is recommended action? Slide # 45 Slide # 46 Multicollinearity Solution P-value = 0.12 tstat pvalue low ( .05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) Multicollinearity Solution P-value = 0.25 = .79 Highest Absolute Value of Correlation Coefficient g high ( .9) moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP = .83 Highest Absolute Value of Correlation Coefficient g high ( .9) moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP tstat pvalue low ( .05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) Slide # 47 Slide # 48 8 Multicollinearity Solution P-value = 0.25 tstat pvalue low ( .05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) Multicollinearity Solution P-value = 0.02 = .83 Highest Absolute Value of Correlation Coefficient g high ( .9) moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP = .93 Highest Absolute Value of Correlation Coefficient g high ( .9) moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP tstat pvalue low ( .05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) Slide # 49 Slide # 50 Multicollinearity Solution P-value = 0.02 tstat pvalue low ( .05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) Multicollinearity Solution P-value 0.05 = .93 Highest Absolute Value of Correlation Coefficient high ( .9) g moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP = ANY VALUE Highest Absolute Value of Correlation Coefficient high ( .9) 9) moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP tstat pvalue low ( .05) 05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) Slide # 51 Slide # 52 Multicollinearity Solution P-value 0.05 tstat pvalue low ( .05) 05) low ( .05) low ( .05) high (> .05) high (> .05) high (> .05) = ANY VALUE Highest Absolute Value of Correlation Coefficient high ( .9) 9) moderate ( .8 < < .9) low ( .8) high ( .9) moderate ( .8 < < .9) low ( .8) Action KEEP KEEP KEEP DROP KEEP DROP Near Multicollinearity (cont.) 6. When you've decided not to drop a collinear IV a) List collinear IVs b) State that each with low tstat might actually effect DV but c) it appears not to effect DV because of collinearity d) (of course, it might truly NOT effect DV) Slide # 53 Slide # 54 9 Exercise Multicollinearity exercise #2 Slide # 55 10 ...
View Full Document

This note was uploaded on 09/09/2011 for the course ECON 4451 taught by Professor Richardhofler during the Fall '11 term at University of Central Florida.

Ask a homework question - tutors are online