398_39_solutions-instructor-manual_7-heteroscedasticity_im_ch07
13 Pages

398_39_solutions-instructor-manual_7-heteroscedasticity_im_ch07

Course Code: ECON 101 , Spring 2013

University or Institution: LSE

Word Count: 4626

Rating:

Document Preview

Dougherty: Introduction to Econometrics 4e Instructors Manual 7 HETEROSCEDASTICITY 7.1 Heteroscedasticity and its implications 7.2 Detection of heteroscedasticity 7.1 The table gives data on government recurrent expenditure, G, investment, I, gross domestic product, Y, and population, P, for 30 countries in 1997 (source: 1999 International Monetary Fund Yearbook). G, I, and Y are measured in US$ billion and P...

Unformatted Document Excerpt
Coursehero >> United Kingdom >> LSE >> ECON 101

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Introduction Dougherty: to Econometrics 4e Instructors Manual 7 HETEROSCEDASTICITY 7.1 Heteroscedasticity and its implications 7.2 Detection of heteroscedasticity 7.1 The table gives data on government recurrent expenditure, G, investment, I, gross domestic product, Y, and population, P, for 30 countries in 1997 (source: 1999 International Monetary Fund Yearbook). G, I, and Y are measured in US$ billion and P in million. A researcher investigating whether government expenditure tends to crowd out investment fits the regression (standard errors in parentheses): I = 18.10 1.07G + 0.36Y (7.79) (0.14) (0.02) Country Australia Austria Canada Czech Republic Denmark Finland France Germany Greece Iceland Ireland Italy Japan Korea Malaysia I G Y P 94.5 46.0 119.3 16.0 34.2 20.2 255.9 422.5 24.0 1.4 14.3 190.8 1105.9 154.9 41.6 75.5 39.2 125.1 10.5 42.9 25.0 347.2 406.7 17.7 1.5 10.1 189.7 376.3 49.3 10.8 407.9 206.0 631.2 52.0 169.3 121.5 1409.2 2102.7 119.9 7.5 73.2 1145.4 3901.3 442.5 97.3 18.5 8.1 30.3 10.3 5.3 5.1 58.6 82.1 10.5 0.3 3.7 57.5 126.1 46.0 21.0 Country Netherlands New Zealand Norway Philippines Poland Portugal Russia Singapore Spain Sweden Switzerland Thailand Turkey UK USA R2 = 0.99. I G Y P 73.0 12.9 35.3 20.1 28.7 25.6 84.7 35.6 109.5 31.2 50.2 48.1 50.2 210.1 1517.7 49.9 9.9 30.9 10.7 23.4 19.9 94.0 9.0 86.0 58.8 38.7 15.0 23.3 230.7 1244.1 360.5 65.1 153.4 82.2 135.6 102.1 436.0 95.9 532.0 227.8 256.0 153.9 189.1 1256.0 8110.9 15.6 3.8 4.4 78.5 38.7 9.8 147.1 3.7 39.3 8.9 7.1 60.6 62.5 58.2 267.9 She sorts the observations by increasing size of Y and runs the regression again for the 11 countries with smallest Y and the 11 countries with largest Y. RSS for these regressions is 321 and 28101, respectively. Perform a GoldfeldQuandt test for heteroscedasticity. Answer: RSS2/RSS1 = 28101/321 = 87.5. The critical value of F(8,8) at the 0.1 percent level is 12.0, so the null hypothesis of homoscedasticity is rejected at that significance level. 7.2 The researcher saves the residuals from the full-sample regression in Exercise 7.1 and regresses their squares on G, Y, their squares, and their product. R2 is 0.9878. Perform a White test for heteroscedasticity. Answer: In the output below EI2 is the squared residual from the full-sample regression in Exercise 7.1 and the other variables are self-explanatory. The White test statistic is nR2 = 30*0.9878 = 29.63. Under the null hypothesis of homoscedasticity, this is distributed as a chi squares statistic with five degrees of freedom. The null hypothesis is rejected at the 0.1 percent level, critical value 20.52. C. Dougherty 2011. All rights reserved. 2 HETEROSCEDASTICITY . reg EI2 G Y G2 Y2 GYPROD Source | SS df MS -------------+-----------------------------Model | 229715064 5 45943012.8 Residual | 2831715.29 24 117988.137 -------------+-----------------------------Total | 232546779 29 8018854.45 Number of obs F( 5, 24) Prob > F R-squared Adj R-squared Root MSE = = = = = = 30 389.39 0.0000 0.9878 0.9853 343.49 -----------------------------------------------------------------------------EI2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------G | -36.11647 7.400742 -4.88 0.000 -51.39085 -20.84209 Y| 5.522015 1.188795 4.65 0.000 3.068463 7.975567 G2 | .5463273 .027054 20.19 0.000 .4904907 .602164 Y2 | .007948 .0003913 20.31 0.000 .0071403 .0087556 GYPROD | -.1350344 .0060013 -22.50 0.000 -.1474204 -.1226485 _cons | 196.4158 92.97307 2.11 0.045 4.528833 388.3028 ------------------------------------------------------------------------------ 7.3 Fit an earnings function using your EAEF data set, taking EARNINGS as the dependent variable and S, EXP, and MALE as the explanatory variables, and perform a GoldfeldQuandt test for heteroscedasticity in the S dimension. Remember to sort the observations by S first. Answer: If the observations in EAEF Data Set 22 are ordered by S and subregressions are run using the first and last 203 observations, RSS is 11,824 for the first 203 observations and 50,609 for the last 203. The ratio is 4.28. We need the critical values of F(196,196), but those for F(200,200) will be virtually identical. The critical value of F(200,200) at the 0.1 percent level is 1.55, so the linear specification is definitely heteroscedastic. . sort S . reg EARNINGS S EXP MALE in 1/203 Source | SS df MS -------------+-----------------------------Model | 2326.57129 3 775.523765 Residual | 11823.7866 199 59.4160129 -------------+-----------------------------Total | 14150.3579 202 70.0512765 Number of obs F( 3, 199) Prob > F R-squared Adj R-squared Root MSE = = = = = = 203 13.05 0.0000 0.1644 0.1518 7.7082 -----------------------------------------------------------------------------EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S| .936945 .4509822 2.08 0.039 .0476278 1.826262 EXP | .4084378 .1094286 3.73 0.000 .1926493 .6242263 MALE | 3.641454 1.1077 3.29 0.001 1.457118 5.825791 _cons | -5.130995 5.283363 -0.97 0.333 -15.54956 5.287566 -----------------------------------------------------------------------------. reg EARNINGS S EXP MALE in 338/540 Source | SS df MS -------------+-----------------------------Model | 18149.6383 3 6049.87942 Residual | 50608.9949 199 254.316557 -------------+-----------------------------Total | 68758.6332 202 340.389273 Number of obs F( 3, 199) Prob > F R-squared Adj R-squared Root MSE = = = = = = 203 23.79 0.0000 0.2640 0.2529 15.947 -----------------------------------------------------------------------------EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S| 4.601231 .7544586 6.10 0.000 3.113471 6.08899 C. Dougherty 2011. All rights reserved. 3 HETEROSCEDASTICITY EXP | 1.149634 .3031796 3.79 0.000 .5517769 1.747491 MALE | 11.46224 2.267416 5.06 0.000 6.990992 15.93348 _cons | -71.77737 14.63504 -4.90 0.000 -100.637 -42.9177 ------------------------------------------------------------------------------ 7.4 Fit an earnings function using your EAEF data set, using the same specification as in Exercise 7.3 and perform a White test for heterscedasticity. Answer: The output shows first the basic wage equation, with the residuals saved as EEARN, then the definitions of the squares and products, and then the regression of the squared residuals. The test statistic is 540*0.0691 = 37.31. The critical value of chi-squared at the 0.1 percent significance level with 8 degrees of freedom is 26.12. Hence we reject the null hypothesis of homoscedasticity. . reg EARNINGS S EXP MALE Source | SS df MS -------------+-----------------------------Model | 33593.9888 3 11197.9963 Residual | 86924.4391 536 162.172461 -------------+-----------------------------Total | 120518.428 539 223.596341 Number of obs F( 3, 536) Prob > F R-squared Adj R-squared Root MSE = = = = = = 540 69.05 0.0000 0.2787 0.2747 12.735 -----------------------------------------------------------------------------EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S| 2.863229 .2275297 12.58 0.000 2.41627 3.310189 EXP | .5487349 .1225199 4.48 0.000 .3080568 .789413 MALE | 6.716579 1.122657 5.98 0.000 4.511233 8.921925 _cons | -31.88974 4.120556 -7.74 0.000 -39.98416 -23.79532 -----------------------------------------------------------------------------. predict EEARN, resid . . . . . g g g g g EEARNSQ = EEARN*EEARN SSQ = S*S EXPSQ = EXP*EXP MALESQ = MALE*MALE SEXP = S*EXP . reg EEARNSQ S EXP MALE SSQ EXPSQ MALESQ SEXP MALES MALEEXP; Source | SS df MS -------------+-----------------------------Model | 15058509.4 8 1882313.67 Residual | 202970073 531 382241.193 -------------+-----------------------------Total | 218028583 539 404505.719 Number of obs F( 8, 531) Prob > F R-squared Adj R-squared Root MSE = = = = = = 540 4.92 0.0000 0.0691 0.0550 618.26 -----------------------------------------------------------------------------EEARNSQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S | -187.2417 76.94705 -2.43 0.015 -338.3997 -36.08377 EXP | -119.4142 45.97819 -2.60 0.010 -209.7357 -29.09273 MALE | -819.0745 426.2914 -1.92 0.055 -1656.499 18.34991 SSQ | 4.481694 2.513909 1.78 0.075 -.4567325 9.420121 EXPSQ | 1.421899 1.011588 1.41 0.160 -.5653065 3.409105 MALESQ | (dropped) SEXP | 6.103315 2.353413 2.59 0.010 1.480173 10.72646 MALES | 44.11235 23.13765 1.91 0.057 -1.340212 89.56491 MALEEXP | 19.13805 12.23941 1.56 0.118 -4.905564 43.18165 _cons | 1983.873 678.0395 2.93 0.004 651.9039 3315.842 ------------------------------------------------------------------------------ C. Dougherty 2011. All rights reserved. 4 HETEROSCEDASTICITY 7.5* The following regressions were fitted using the Shanghai school cost data introduced in Section 5.1 (standard errors in parentheses): COS T = 24,000 + 339N (27,000) (50) R2 = 0.39 COS T = 51,000 4,000OCC + 152N + 284NOCC (31,000) (41,000) (60) (76) R2 = 0.68. where COST is the annual cost of running a school, N is the number of students, OCC is a dummy variable defined to be 0 for regular schools and 1 for occupational schools, and NOCC is a slope dummy variable defined as the product of N and OCC. There are 74 schools in the sample. With the data sorted by N, the regressions are fitted again for the 26 smallest and 26 largest schools, the residual sum of squares being as shown in the table. 26 smallest First regression Second regression 26 largest 7.8 1010 6.7 1010 54.4 1010 13.8 1010 Perform a GoldfeldQuandt test for heteroscedasticity for the two models and, with reference to Figure 5.5, explain why the problem of heteroscedasticity is less severe in the second model. Answer: For both regressions RSS will be denoted RSS1 for the 26 smallest schools and RSS2 for the 26 largest schools. In the first regression, RSS2/RSS1 = (54.4 1010)/(7.8 1010) = 6.97. There are 24 degrees of freedom in each subsample (26 observations, 2 parameters estimated). The critical value of F(24,24) is approximately 3.7 at the 0.1 percent level, and so we reject the null hypothesis of homoscedasticity at that level. In the second regression, RSS2/RSS1 = (13.8 1010)/(6.7 1010) = 2.06. There are 22 degrees of freedom in each subsample (26 observations, 4 parameters estimated). The critical value of F(22,22) is 2.05 at the 5 percent level, and so we (just) reject the null hypothesis of homoscedasticity at that significance level. COST 600000 500000 400000 300000 200000 100000 0 0 200 400 600 Occupational schools 800 1000 1200 Regular schools Shanghai schools: cost and number of students C. Dougherty 2011. All rights reserved. N 5 HETEROSCEDASTICITY Why is the problem of heteroscedasticity less severe in the second regression? The figure (Figure 6.5 in the text) reveals that the cost function is much stee per for the occupational schools than for the regular schools, reflecting their higher marginal cost . As a consequence the two sets of observations diverge as the number of students increases and the scatter is bound to appear heteroscedastic, irrespective of whether the disturbance term is truly heteroscedastic or not. The first regression takes no account of this and the GoldfeldQuandt test therefore indicates significant heteroscedasticity. In the second regression this problem does not arise because the intercept and slope dummy variables allow separate implicit regression lines for the two types of school. (However there does seem to be some genuine heteroscedasticity.) Looking closely at the diagram, the observations for the occupational schools exhib it a classic pattern of true heteroscedasticity, and this would be confirmed by a Goldfeld Quandt test confined to the subsample of those schools. However the observations for the regular schools appear to be homoscedastic and this accounts for the fact that we only just rejected the null hypothesis of homoscedasticity for the combined sample. 7.6* The file educ.dta on the website contains international cross-sectional data on aggregate expenditure on education, EDUC, gross domestic product, GDP, and population, POP, for a sample of 38 countries in 1997. EDUC and GDP are measured in US$ million and POP is measured in thousands. See Appendix B for further information. Download the data set, plot a scatter diagram of EDUC on GDP, and comment on whether the data set appears to be subject to heteroscedasticity. Sort the data set by GDP and perform a GoldfeldQuandt test for heteroscedasticity, running regressions using the subsamples of 14 countries with the smallest and greatest GDP. Answer: The figure plots expenditure on education, EDUC, and gross domestic product, GDP for the 38 countries in the sample. The observations exhibit heteroscedasticity. Sorting them by GDP and regressing EDUC on GDP for the subsamples of 14 countries with smallest and greatest GDP, the residual sum of squares for the first and second subsamples, denoted RSS1 and RSS2, respectively, are 1,660,000 and 63,113,000 respectively. Hence F (12,12) RSS2 63113000 38.02. RSS1 1660000 The critical value of F(12,12) at the 0.1 percent level is 7.00, and so we reject the null hypothesis of homoscedasticity. C. Dougherty 2011. All rights reserved. 6 HETEROSCEDASTICITY Expenditure on education ($ million) 25000 20000 15000 10000 5000 0 0 100000 200000 300000 400000 500000 600000 GDP ($ million) Expenditure on education and GDP 7.3 Remedies for heteroscedasticity 7.7 The researcher mentioned in Exercise 7.1 runs the following regressions as alternative specifications of the model (standard errors in parentheses): 1 Y G I = 0.03 0.69 + 0.34 P P P P (0.28) (0.16) (0.03) 1 G I = 0.39 + 0.03 0.93 Y Y Y (0.04) (0.42) (0.22) ^ log I = 2.44 0.63 log G + 1.60 log Y (0.26) (0.12) (0.12) R2 = 0.97 (1) R2 = 0.78 (2) R2 = 0.98. (3) In each case the regression is run again for the subsamples of observations with the 11 smallest and 11 greatest values of the sorting variable, after sorting by Y/P, G/Y, and log Y, respectively. The residual sums of squares are as shown in the table. 11 smallest (1) (2) (3) 1.43 0.0223 0.573 11 largest 12.63 0.0155 0.155 Perform a GoldfeldQuandt test for each model specification and discuss the merits of each specification. Is there evidence that investment is an inverse function of government expenditure? Answer: In the first specification, RSS2/RSS1 is 8.83. Since the critical value of F(8,8) at the 1 percent level is 6.03, the null hypothesis of homoscedasticity would be rejected at that significance level. For the other two specifications, RSS1 is greater than RSS2 and so one should C. Dougherty 2011. All rights reserved. 7 HETEROSCEDASTICITY test for inverse GoldfeldQuandt heteroscedasticity. For the second specification, RSS2/RSS1 is 1.44, and so the null hypothesis of homoscedasticity is not rejected at the 5 percent level, the critical value of F(8,8) being 3.44. For the third specification, RSS2/RSS1 is 3.70, and so the null hypothesis of homoscedasticity is rejected at the 5 percent level but not the 1 percent level . The second which specification, appears to be free from heteroscedasticity, does indeed suggest that the share of investment in GDP is a negative function of the share of government expenditure in GDP, the t statistic for G/Y being 4.23. The third specification, which shows signs of being subject to heteroscedasticity, tells much the same story, the elasticity of I with respect to G being estimated at 0.63, holding Y constant. The t statistic is so large that the effect is probably significant, even allowing for heteroscedasticity. 7.8 Using your EAEF data set, repeat Exercises 7.3 and 7.4 with LGEARN as the dependent variable. Is there evidence that this is a preferable specification? Answer: For the GoldfeldQuandt test, RSS is 44.25 for the first 203 observations and 60.47 for the last 203. The ratio is 1.37. We need the critical values of F(196,196), but those for F(200,200) will be virtually identical. The critical value of F(200,200) is 1.39 at the 1 percent level, so the logarithmic specification is also subject to heteroscedasticity. For the White test, the test statistic is 540*0.0154 = 8.32. The critical value of chi-squared at the 5 percent level with 6 degrees of freedom is 12.59, so the null hypothesis of homoscedasticity is not rejected. Clearly the test is less powerful than the GoldfeldQuandt test when the latter is appropriate. . reg LGEARN S EXP MALE in 1/203 Source | SS df MS -------------+-----------------------------Model | 13.4919792 3 4.4973264 Residual | 44.2493309 199 .222358447 -------------+-----------------------------Total | 57.7413101 202 .28584807 Number of obs F( 3, 199) Prob > F R-squared Adj R-squared Root MSE = = = = = = 203 20.23 0.0000 0.2337 0.2221 .47155 -----------------------------------------------------------------------------LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S| .065024 .0275889 2.36 0.019 .0106199 .1194282 EXP | .0338008 .0066943 5.05 0.000 .0205999 .0470017 MALE | .2533485 .0677637 3.74 0.000 .1197213 .3869756 _cons | 1.077628 .3232105 3.33 0.001 .4402712 1.714985 -----------------------------------------------------------------------------. reg LGEARN S EXP MALE in 338/540 Source | SS df MS -------------+-----------------------------Model | 26.4944741 3 8.83149135 Residual | 60.4705676 199 .303872199 -------------+-----------------------------Total | 86.9650417 202 .430520008 Number of obs F( 3, 199) Prob > F R-squared Adj R-squared Root MSE = = = = = = 203 29.06 0.0000 0.3047 0.2942 .55125 -----------------------------------------------------------------------------LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S| .1825832 .0260792 7.00 0.000 .1311563 .2340102 EXP | .0458042 .0104799 4.37 0.000 .0251382 .0664701 MALE | .4107624 .0783771 5.24 0.000 .2562061 .5653186 _cons | -.8087749 .5058854 -1.60 0.111 -1.806359 .188809 ------------------------------------------------------------------------------ C. Dougherty 2011. All rights reserved. 8 HETEROSCEDASTICITY . reg LGEARN S EXP MALE Source | SS df MS -------------+-----------------------------Model | 75.3997118 3 25.1332373 Residual | 138.610676 536 .258602007 -------------+-----------------------------Total | 214.010387 539 .397050811 Number of obs F( 3, 536) Prob > F R-squared Adj R-squared Root MSE = = = = = = 540 97.19 0.0000 0.3523 0.3487 .50853 -----------------------------------------------------------------------------LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S| .1317248 .0090859 14.50 0.000 .1138765 .149573 EXP | .0348221 .0048925 7.12 0.000 .0252112 .044433 MALE | .3048496 .0448306 6.80 0.000 .2167845 .3929148 _cons | .2449455 .1645445 1.49 0.137 -.0782856 .5681765 -----------------------------------------------------------------------------. predict ELGEARN, resid . g ELGEARN2 = ELGEARN*ELGEARN . reg ELGEARN2 S EXP MALE S2 EXP2 SEXP Source | SS df MS -------------+-----------------------------Model | 1.75925279 6 .293208799 Residual | 112.523163 533 .211112877 -------------+-----------------------------Total | 114.282416 539 .212026746 Number of obs F( 6, 533) Prob > F R-squared Adj R-squared Root MSE = = = = = = 540 1.39 0.2170 0.0154 0.0043 .45947 -----------------------------------------------------------------------------ELGEARN2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S | -.0502755 .0568298 -0.88 0.377 -.1619134 .0613624 EXP | -.0375013 .0338021 -1.11 0.268 -.1039029 .0289003 MALE | -.0103873 .0408412 -0.25 0.799 -.0906169 .0698422 S2 | .0004303 .0018432 0.23 0.815 -.0031905 .0040511 EXP2 | -.0003143 .0007424 -0.42 0.672 -.0017727 .001144 SEXP | .0030162 .0017068 1.77 0.078 -.0003367 .0063691 _cons | .9063623 .5036147 1.80 0.072 -.0829509 1.895675 ------------------------------------------------------------------------------ 7.9* Repeat Exercise 7.6, using the GoldfeldQuandt test to investigate whether scaling by population or by GDP, or whether running the regression in logarithmic form, would eliminate the heteroscedasticity. Compare the results of regressions using the entire sample and the alternative specifications. Answer: Dividing through by population, POP, the model becomes EDUC 1 GDP u 1 2 , POP POP POP POP with expenditure on education per capita, denoted EDUCPOP, hypothesized to be a function of gross domestic product per capita, GDPPOP, and the reciprocal of population, POPREC, with no intercept. Sorting the sample by GDPPOP and running the regression for the subsamples of 14 countries with smallest and largest GDPPOP, RSS1 = 0.006788 and RSS2 = 1.415516. Now F (12,12) C. Dougherty 2011. All rights reserved. RSS2 1.415516 208.5. RSS1 0.006788 9 HETEROSCEDASTICITY Thus the model is still subject to heteroscedasticity at the 0.1 percent level. This is evident in the figure. 2500 EDUC/POP 2000 1500 1000 500 0 0 5000 10000 15000 20000 25000 30000 35000 40000 GDP/POP Expenditure on education per capita and GDP per capita Dividing through instead by GDP, the model becomes EDUC 1 u , 1 2 GDP GDP GDP with expenditure on education as a share of gross domestic product, denoted EDUCGDP, hypothesized to be a simple function of the reciprocal of gross domestic product, GDPREC, with no intercept. Sorting the sample by GDPREC and running the regression for the subsamples of 14 countries with smallest and largest GDPREC, RSS1 = 0.00413 and RSS2 = 0.00238. Since RSS2 is less than RSS1, we test for heteroscedasticity under the hypothesis that the standard deviation of the disturbance term is inversely related to GDPREC: F (12,12) RSS1 0.00413 1.74. RSS2 0.00238 The critical value of F(12,12) at the 5 percent level is 2.69, so we do not reject the null hypothesis of homoscedasticity. Could one tell this from the figure? It is a little difficult to say. C. Dougherty 2011. All rights reserved. 10 HETEROSCEDASTICITY 0.08 0.07 EDUC /GDP 0.06 0.05 0.04 0.03 0.02 0.01 0 0 0.00002 0.00004 0.00006 0.00008 0.0001 0.00012 1/GDP Expenditure on education as a proportion of GDP and the reciprocal of GDP Finally, we will consider a logarithmic specification. If the true relationship is logarithmic, and homoscedastic, it would not be surprising that the linear model appeared heteroscedastic . Sorting the sample by GDP, RSS1 and RSS2 are 2.733 and 3.438 for the subsamples of 14 countries with smallest and greatest GDP. The F statistic is F (12,12) RSS1 3.438 1.2 6 RSS2 2.733 11 10 log EDUC 9 8 7 6 5 4 8 9 10 11 12 13 14 log GDP Expenditure on education and GDP, logarithmic Thus again we would not reject the null hypothesis of homoscedasticity. The third and fourth specifications both appear to be free from heteroscedasticity. How do we choose between them? We will examine the regression results, shown for the two models with the full sample: . reg EDUCGDP GDPREC Source | SS C. Dougherty 2011. All rights reserved. df MS Number of obs = 38 11 HETEROSCEDASTICITY ---------+-----------------------------Model | .001348142 1 .001348142 Residual | .008643037 36 .000240084 ---------+-----------------------------Total | .009991179 37 .000270032 F( 1, 36) Prob > F R-squared Adj R-squared Root MSE = = = = = 5.62 0.0233 0.1349 0.1109 .01549 -----------------------------------------------------------------------------EDUCGDP | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------GDPREC | -234.0823 98.78309 -2.370 0.023 -434.4236 -33.74086 _cons | .0484593 .0036696 13.205 0.000 .0410169 .0559016 -----------------------------------------------------------------------------. reg LGEE LGGDP Source | SS df MS ---------+-----------------------------Model | 51.9905508 1 51.9905508 Residual | 7.6023197 36 .211175547 ---------+-----------------------------Total | 59.5928705 37 1.61061812 Number of obs F( 1, 36) Prob > F R-squared Adj R-squared Root MSE = = = = = = 38 246.20 0.0000 0.8724 0.8689 .45954 -----------------------------------------------------------------------------LGEE | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------LGGDP | 1.160594 .0739673 15.691 0.000 1.010582 1.310607 _cons | -5.025204 .8152239 -6.164 0.000 -6.678554 -3.371853 ------------------------------------------------------------------------------ In equation form, the first regression is 1 EDUC = 0.048 234.1 GDP GDP (0.004) (98.8) R2 = 0.13 Multiplying through by GDP, it may be rewritten EDUC = 234.1 + 0.048GDP It implies that expenditure on education accounts for 4.8 percent of gross domestic product at the margin. The constant does not have any sensible interpretation. We will compare this with the output from an OLS regression that makes no attempt to eliminate heteroscedasticity: . reg EDUC GDP Source | SS df MS ---------+-----------------------------Model | 1.0571e+09 1 1.0571e+09 Residual | 74645819.2 36 2073494.98 ---------+-----------------------------Total | 1.1317e+09 37 30586911.0 Number of obs F( 1, 36) Prob > F R-squared Adj R-squared Root MSE = = = = = = 38 509.80 0.0000 0.9340 0.9322 1440.0 -----------------------------------------------------------------------------EDUC | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------GDP | .0480656 .0021288 22.579 0.000 .0437482 .052383 _cons | -160.4669 311.699 -0.515 0.610 -792.6219 471.688 ------------------------------------------------------------------------------ The slope coefficient, 0.48, is identical to two decimal places. This is not entirely a surprise, since heteroscedasticity does not give rise to bias and so there should be no systematic C. Dougherty 2011. All rights reserved. 12 HETEROSCEDASTICITY difference between the estimate from an OLS regression and that from a specification that eliminates heteroscedasticity. Of course, it is a surprise that the estimates are so close. Generally there would be some random difference, and of course the OLS estimate would tend to be less accurate. In this case, the main difference is in the estimated standard error. That for the OLS regression is actually smaller than that for the regression of EDUCGDP on GDPREC, but it is misleading. It is incorrectly calculated and we know that, since OLS is inefficient, the true standard error for the OLS estimate is actually larger. The logarithmic regression in equation form is log EDUC = 5.03 + 1.17 log GDP (0.82) (0.07) R2 = 0.87 implying that the elasticity of expenditure on education with regard to gross domestic product is 1.17. In substance the interpretations of the models are similar, since both imply that the proportion of GDP allocated to education increases slowly with GDP, but the elasticity specification seems a little more informative and probably serves as a better starting point for further exploration. For example, it would be natural to add the logarithm of population to see if population had an independent effect. 7.10* It was reported above that the heteroscedasticity-consistent estimate of the standard error of the coefficient of GDP in equation (7.13) was 0.18. Explain why the corresponding standard error in equation (7.15) ought to be lower and comment on the fact that it is not. Answer: (7.15), unlike (7.13) appears to be free from heteroscedasticity and therefore should provide more efficient estimates of the coefficients, reflected in lower standard errors when computed correctly. However the sample may be too small for the heteroscedasticity-consistent estimator to be a good guide. 7.11* A health economist plans to evaluate whether screening patients on arrival or spending extra money on cleaning is more effective in reducing the incidence of infections by the MRSA bacterium in hospitals. She hypothesizes the following model: MRSAi 1 2 S i 3Ci ui where, in hospital i, MRSA is the number of infections per thousand patients, S is expenditure per patient on screening, and C is expenditure per patient on cleaning. ui is a disturbance term that satisfies the usual regression model assumptions. In particular, ui is drawn from a distribution with mean zero and constant variance 2. The researcher would like to fit the relationship using a sample of hospitals. Unfortunately, data for individual hospitals are not available. Instead she has to use regional data to fit MRSAj 1 2 S j 3 C j u j where MRSAj , S j , C j , and u j are the averages of MRSA, S, C, and u for the hospitals in region j. There were different numbers of hospitals in the regions, there being nj hospitals in region j. C. Dougherty 2011. All rights reserved. 13 HETEROSCEDASTICITY 2 and that an OLS regression using the grouped nj Show that the variance of u j is equal to regional data to fit the relationship will be subject to heteroscedasticity. Assuming that the researcher knows the value of nj for each region, explain how she could re-specify the regression model to make it homoscedastic. State the revised specification and demonstrate mathematically that it is homoscedastic. Give an intuitive explanation of why the revised specification should tend to produce improved estimates of the parameters. Answer: 1 var u j var nj 1 u jk nj k 1 nj 2 nj var u jk 1 k 1 nj 2n j varu jk k 1 since the covariance terms are all 0. Hence 1 var u j nj 2 2 n j 2 nj To eliminate the heteroscedasticity, multiply observation j by n j . The regression becomes n j MRSAj 1 n j 2 n j S j 3 n j C j n j u j The variance of the disturbance term is now n var n j u j 2 j var u j n j 2 nj 2 and is thus the same for all observations. From the expression for var u j , we see that, the larger the group, the more reliable should be its observation (the closer its observation should tend to be to the population relationship) . The scaling gives greater weight to the more reliable observations and the resulting estimators should be more efficient. C. Dougherty 2011. All rights reserved.
MOST POPULAR MATERIALS FROM ECON 101
MOST POPULAR MATERIALS FROM ECON
MOST POPULAR MATERIALS FROM LSE