Unformatted Document Excerpt
Coursehero >>
United Kingdom >>
LSE >>
ECON 101
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Introduction Dougherty: to Econometrics 4e
Instructors Manual
7
HETEROSCEDASTICITY
7.1 Heteroscedasticity and its implications
7.2 Detection of heteroscedasticity
7.1
The table gives data on government recurrent expenditure, G, investment, I, gross domestic
product, Y, and population, P, for 30 countries in 1997 (source: 1999 International Monetary
Fund Yearbook). G, I, and Y are measured in US$ billion and P in million. A researcher
investigating whether government expenditure tends to crowd out investment fits the regression
(standard errors in parentheses):
I = 18.10 1.07G + 0.36Y
(7.79) (0.14) (0.02)
Country
Australia
Austria
Canada
Czech Republic
Denmark
Finland
France
Germany
Greece
Iceland
Ireland
Italy
Japan
Korea
Malaysia
I
G
Y
P
94.5
46.0
119.3
16.0
34.2
20.2
255.9
422.5
24.0
1.4
14.3
190.8
1105.9
154.9
41.6
75.5
39.2
125.1
10.5
42.9
25.0
347.2
406.7
17.7
1.5
10.1
189.7
376.3
49.3
10.8
407.9
206.0
631.2
52.0
169.3
121.5
1409.2
2102.7
119.9
7.5
73.2
1145.4
3901.3
442.5
97.3
18.5
8.1
30.3
10.3
5.3
5.1
58.6
82.1
10.5
0.3
3.7
57.5
126.1
46.0
21.0
Country
Netherlands
New Zealand
Norway
Philippines
Poland
Portugal
Russia
Singapore
Spain
Sweden
Switzerland
Thailand
Turkey
UK
USA
R2 = 0.99.
I
G
Y
P
73.0
12.9
35.3
20.1
28.7
25.6
84.7
35.6
109.5
31.2
50.2
48.1
50.2
210.1
1517.7
49.9
9.9
30.9
10.7
23.4
19.9
94.0
9.0
86.0
58.8
38.7
15.0
23.3
230.7
1244.1
360.5
65.1
153.4
82.2
135.6
102.1
436.0
95.9
532.0
227.8
256.0
153.9
189.1
1256.0
8110.9
15.6
3.8
4.4
78.5
38.7
9.8
147.1
3.7
39.3
8.9
7.1
60.6
62.5
58.2
267.9
She sorts the observations by increasing size of Y and runs the regression again for the 11
countries with smallest Y and the 11 countries with largest Y. RSS for these regressions is 321
and 28101, respectively. Perform a GoldfeldQuandt test for heteroscedasticity.
Answer: RSS2/RSS1 = 28101/321 = 87.5. The critical value of F(8,8) at the 0.1 percent level is
12.0, so the null hypothesis of homoscedasticity is rejected at that significance level.
7.2
The researcher saves the residuals from the full-sample regression in Exercise 7.1 and regresses
their squares on G, Y, their squares, and their product. R2 is 0.9878. Perform a White test for
heteroscedasticity.
Answer: In the output below EI2 is the squared residual from the full-sample regression in
Exercise 7.1 and the other variables are self-explanatory. The White test statistic is nR2 =
30*0.9878 = 29.63. Under the null hypothesis of homoscedasticity, this is distributed as a chi squares statistic with five degrees of freedom. The null hypothesis is rejected at the 0.1 percent
level, critical value 20.52.
C. Dougherty 2011. All rights reserved.
2
HETEROSCEDASTICITY
. reg EI2 G Y G2 Y2 GYPROD
Source |
SS
df
MS
-------------+-----------------------------Model |
229715064
5 45943012.8
Residual | 2831715.29
24 117988.137
-------------+-----------------------------Total |
232546779
29 8018854.45
Number of obs
F( 5,
24)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
30
389.39
0.0000
0.9878
0.9853
343.49
-----------------------------------------------------------------------------EI2 |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------G | -36.11647
7.400742
-4.88
0.000
-51.39085
-20.84209
Y|
5.522015
1.188795
4.65
0.000
3.068463
7.975567
G2 |
.5463273
.027054
20.19
0.000
.4904907
.602164
Y2 |
.007948
.0003913
20.31
0.000
.0071403
.0087556
GYPROD | -.1350344
.0060013
-22.50
0.000
-.1474204
-.1226485
_cons |
196.4158
92.97307
2.11
0.045
4.528833
388.3028
------------------------------------------------------------------------------
7.3
Fit an earnings function using your EAEF data set, taking EARNINGS as the dependent variable
and S, EXP, and MALE as the explanatory variables, and perform a GoldfeldQuandt test for
heteroscedasticity in the S dimension. Remember to sort the observations by S first.
Answer: If the observations in EAEF Data Set 22 are ordered by S and subregressions are run using
the first and last 203 observations, RSS is 11,824 for the first 203 observations and 50,609 for the
last 203. The ratio is 4.28. We need the critical values of F(196,196), but those for F(200,200) will
be virtually identical. The critical value of F(200,200) at the 0.1 percent level is 1.55, so the linear
specification is definitely heteroscedastic.
. sort S
. reg EARNINGS S EXP MALE in 1/203
Source |
SS
df
MS
-------------+-----------------------------Model | 2326.57129
3 775.523765
Residual | 11823.7866
199 59.4160129
-------------+-----------------------------Total | 14150.3579
202 70.0512765
Number of obs
F( 3,
199)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
203
13.05
0.0000
0.1644
0.1518
7.7082
-----------------------------------------------------------------------------EARNINGS |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S|
.936945
.4509822
2.08
0.039
.0476278
1.826262
EXP |
.4084378
.1094286
3.73
0.000
.1926493
.6242263
MALE |
3.641454
1.1077
3.29
0.001
1.457118
5.825791
_cons | -5.130995
5.283363
-0.97
0.333
-15.54956
5.287566
-----------------------------------------------------------------------------. reg EARNINGS S EXP MALE in 338/540
Source |
SS
df
MS
-------------+-----------------------------Model | 18149.6383
3 6049.87942
Residual | 50608.9949
199 254.316557
-------------+-----------------------------Total | 68758.6332
202 340.389273
Number of obs
F( 3,
199)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
203
23.79
0.0000
0.2640
0.2529
15.947
-----------------------------------------------------------------------------EARNINGS |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S|
4.601231
.7544586
6.10
0.000
3.113471
6.08899
C. Dougherty 2011. All rights reserved.
3
HETEROSCEDASTICITY
EXP |
1.149634
.3031796
3.79
0.000
.5517769
1.747491
MALE |
11.46224
2.267416
5.06
0.000
6.990992
15.93348
_cons | -71.77737
14.63504
-4.90
0.000
-100.637
-42.9177
------------------------------------------------------------------------------
7.4
Fit an earnings function using your EAEF data set, using the same specification as in Exercise
7.3 and perform a White test for heterscedasticity.
Answer: The output shows first the basic wage equation, with the residuals saved as EEARN,
then the definitions of the squares and products, and then the regression of the squared residuals.
The test statistic is 540*0.0691 = 37.31. The critical value of chi-squared at the 0.1 percent
significance level with 8 degrees of freedom is 26.12. Hence we reject the null hypothesis of
homoscedasticity.
. reg EARNINGS S EXP MALE
Source |
SS
df
MS
-------------+-----------------------------Model | 33593.9888
3 11197.9963
Residual | 86924.4391
536 162.172461
-------------+-----------------------------Total | 120518.428
539 223.596341
Number of obs
F( 3,
536)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
540
69.05
0.0000
0.2787
0.2747
12.735
-----------------------------------------------------------------------------EARNINGS |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S|
2.863229
.2275297
12.58
0.000
2.41627
3.310189
EXP |
.5487349
.1225199
4.48
0.000
.3080568
.789413
MALE |
6.716579
1.122657
5.98
0.000
4.511233
8.921925
_cons | -31.88974
4.120556
-7.74
0.000
-39.98416
-23.79532
-----------------------------------------------------------------------------. predict EEARN, resid
.
.
.
.
.
g
g
g
g
g
EEARNSQ = EEARN*EEARN
SSQ = S*S
EXPSQ = EXP*EXP
MALESQ = MALE*MALE
SEXP = S*EXP
. reg EEARNSQ S EXP MALE SSQ EXPSQ MALESQ SEXP MALES MALEEXP;
Source |
SS
df
MS
-------------+-----------------------------Model | 15058509.4
8 1882313.67
Residual |
202970073
531 382241.193
-------------+-----------------------------Total |
218028583
539 404505.719
Number of obs
F( 8,
531)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
540
4.92
0.0000
0.0691
0.0550
618.26
-----------------------------------------------------------------------------EEARNSQ |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S | -187.2417
76.94705
-2.43
0.015
-338.3997
-36.08377
EXP | -119.4142
45.97819
-2.60
0.010
-209.7357
-29.09273
MALE | -819.0745
426.2914
-1.92
0.055
-1656.499
18.34991
SSQ |
4.481694
2.513909
1.78
0.075
-.4567325
9.420121
EXPSQ |
1.421899
1.011588
1.41
0.160
-.5653065
3.409105
MALESQ | (dropped)
SEXP |
6.103315
2.353413
2.59
0.010
1.480173
10.72646
MALES |
44.11235
23.13765
1.91
0.057
-1.340212
89.56491
MALEEXP |
19.13805
12.23941
1.56
0.118
-4.905564
43.18165
_cons |
1983.873
678.0395
2.93
0.004
651.9039
3315.842
------------------------------------------------------------------------------
C. Dougherty 2011. All rights reserved.
4
HETEROSCEDASTICITY
7.5*
The following regressions were fitted using the Shanghai school cost data introduced in Section
5.1 (standard errors in parentheses):
COS T = 24,000 + 339N
(27,000) (50)
R2 = 0.39
COS T = 51,000 4,000OCC + 152N + 284NOCC
(31,000) (41,000)
(60)
(76)
R2 = 0.68.
where COST is the annual cost of running a school, N is the number of students, OCC is a
dummy variable defined to be 0 for regular schools and 1 for occupational schools, and NOCC
is a slope dummy variable defined as the product of N and OCC. There are 74 schools in the
sample. With the data sorted by N, the regressions are fitted again for the 26 smallest and 26
largest schools, the residual sum of squares being as shown in the table.
26 smallest
First regression
Second regression
26 largest
7.8 1010
6.7 1010
54.4 1010
13.8 1010
Perform a GoldfeldQuandt test for heteroscedasticity for the two models and, with reference to
Figure 5.5, explain why the problem of heteroscedasticity is less severe in the second model.
Answer: For both regressions RSS will be denoted RSS1 for the 26 smallest schools and RSS2 for
the 26 largest schools. In the first regression, RSS2/RSS1 = (54.4 1010)/(7.8 1010) = 6.97. There
are 24 degrees of freedom in each subsample (26 observations, 2 parameters estimated). The
critical value of F(24,24) is approximately 3.7 at the 0.1 percent level, and so we reject the null
hypothesis of homoscedasticity at that level. In the second regression, RSS2/RSS1 =
(13.8 1010)/(6.7 1010) = 2.06. There are 22 degrees of freedom in each subsample (26
observations, 4 parameters estimated). The critical value of F(22,22) is 2.05 at the 5 percent
level, and so we (just) reject the null hypothesis of homoscedasticity at that significance level.
COST
600000
500000
400000
300000
200000
100000
0
0
200
400
600
Occupational schools
800
1000
1200
Regular schools
Shanghai schools: cost and number of students
C. Dougherty 2011. All rights reserved.
N
5
HETEROSCEDASTICITY
Why is the problem of heteroscedasticity less severe in the second regression? The figure
(Figure 6.5 in the text) reveals that the cost function is much stee per for the occupational
schools than for the regular schools, reflecting their higher marginal cost . As a consequence the
two sets of observations diverge as the number of students increases and the scatter is bound to
appear heteroscedastic, irrespective of whether the disturbance term is truly heteroscedastic or
not. The first regression takes no account of this and the GoldfeldQuandt test therefore
indicates significant heteroscedasticity. In the second regression this problem does not arise
because the intercept and slope dummy variables allow separate implicit regression lines for the
two types of school. (However there does seem to be some genuine heteroscedasticity.)
Looking closely at the diagram, the observations for the occupational schools exhib it a
classic pattern of true heteroscedasticity, and this would be confirmed by a Goldfeld Quandt
test confined to the subsample of those schools. However the observations for the regular
schools appear to be homoscedastic and this accounts for the fact that we only just rejected the
null hypothesis of homoscedasticity for the combined sample.
7.6*
The file educ.dta on the website contains international cross-sectional data on aggregate
expenditure on education, EDUC, gross domestic product, GDP, and population, POP, for a
sample of 38 countries in 1997. EDUC and GDP are measured in US$ million and POP is
measured in thousands. See Appendix B for further information. Download the data set, plot a
scatter diagram of EDUC on GDP, and comment on whether the data set appears to be subject
to heteroscedasticity. Sort the data set by GDP and perform a GoldfeldQuandt test for
heteroscedasticity, running regressions using the subsamples of 14 countries with the smallest
and greatest GDP.
Answer: The figure plots expenditure on education, EDUC, and gross domestic product, GDP
for the 38 countries in the sample. The observations exhibit heteroscedasticity. Sorting them by
GDP and regressing EDUC on GDP for the subsamples of 14 countries with smallest and
greatest GDP, the residual sum of squares for the first and second subsamples, denoted RSS1
and RSS2, respectively, are 1,660,000 and 63,113,000 respectively. Hence
F (12,12)
RSS2 63113000
38.02.
RSS1
1660000
The critical value of F(12,12) at the 0.1 percent level is 7.00, and so we reject the null
hypothesis of homoscedasticity.
C. Dougherty 2011. All rights reserved.
6
HETEROSCEDASTICITY
Expenditure on education ($ million)
25000
20000
15000
10000
5000
0
0
100000
200000
300000
400000
500000
600000
GDP ($ million)
Expenditure on education and GDP
7.3 Remedies for heteroscedasticity
7.7
The researcher mentioned in Exercise 7.1 runs the following regressions as alternative
specifications of the model (standard errors in parentheses):
1
Y
G
I
= 0.03 0.69
+ 0.34
P
P
P
P
(0.28) (0.16)
(0.03)
1
G
I
= 0.39 + 0.03
0.93
Y
Y
Y
(0.04) (0.42)
(0.22)
^
log I = 2.44 0.63 log G + 1.60 log Y
(0.26) (0.12)
(0.12)
R2 = 0.97
(1)
R2 = 0.78
(2)
R2 = 0.98.
(3)
In each case the regression is run again for the subsamples of observations with the 11 smallest
and 11 greatest values of the sorting variable, after sorting by Y/P, G/Y, and log Y, respectively.
The residual sums of squares are as shown in the table.
11 smallest
(1)
(2)
(3)
1.43
0.0223
0.573
11 largest
12.63
0.0155
0.155
Perform a GoldfeldQuandt test for each model specification and discuss the merits of each
specification. Is there evidence that investment is an inverse function of government
expenditure?
Answer: In the first specification, RSS2/RSS1 is 8.83. Since the critical value of F(8,8) at the 1
percent level is 6.03, the null hypothesis of homoscedasticity would be rejected at that
significance level. For the other two specifications, RSS1 is greater than RSS2 and so one should
C. Dougherty 2011. All rights reserved.
7
HETEROSCEDASTICITY
test for inverse GoldfeldQuandt heteroscedasticity. For the second specification, RSS2/RSS1 is
1.44, and so the null hypothesis of homoscedasticity is not rejected at the 5 percent level, the
critical value of F(8,8) being 3.44. For the third specification, RSS2/RSS1 is 3.70, and so the null
hypothesis of homoscedasticity is rejected at the 5 percent level but not the 1 percent level . The
second which specification, appears to be free from heteroscedasticity, does indeed suggest that
the share of investment in GDP is a negative function of the share of government expenditure in
GDP, the t statistic for G/Y being 4.23. The third specification, which shows signs of being
subject to heteroscedasticity, tells much the same story, the elasticity of I with respect to G
being estimated at 0.63, holding Y constant. The t statistic is so large that the effect is probably
significant, even allowing for heteroscedasticity.
7.8
Using your EAEF data set, repeat Exercises 7.3 and 7.4 with LGEARN as the dependent
variable. Is there evidence that this is a preferable specification?
Answer: For the GoldfeldQuandt test, RSS is 44.25 for the first 203 observations and 60.47 for
the last 203. The ratio is 1.37. We need the critical values of F(196,196), but those for F(200,200)
will be virtually identical. The critical value of F(200,200) is 1.39 at the 1 percent level, so the
logarithmic specification is also subject to heteroscedasticity.
For the White test, the test statistic is 540*0.0154 = 8.32. The critical value of chi-squared at the
5 percent level with 6 degrees of freedom is 12.59, so the null hypothesis of homoscedasticity is not
rejected. Clearly the test is less powerful than the GoldfeldQuandt test when the latter is
appropriate.
. reg LGEARN S EXP MALE in 1/203
Source |
SS
df
MS
-------------+-----------------------------Model | 13.4919792
3
4.4973264
Residual | 44.2493309
199 .222358447
-------------+-----------------------------Total | 57.7413101
202
.28584807
Number of obs
F( 3,
199)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
203
20.23
0.0000
0.2337
0.2221
.47155
-----------------------------------------------------------------------------LGEARN |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S|
.065024
.0275889
2.36
0.019
.0106199
.1194282
EXP |
.0338008
.0066943
5.05
0.000
.0205999
.0470017
MALE |
.2533485
.0677637
3.74
0.000
.1197213
.3869756
_cons |
1.077628
.3232105
3.33
0.001
.4402712
1.714985
-----------------------------------------------------------------------------. reg LGEARN S EXP MALE in 338/540
Source |
SS
df
MS
-------------+-----------------------------Model | 26.4944741
3 8.83149135
Residual | 60.4705676
199 .303872199
-------------+-----------------------------Total | 86.9650417
202 .430520008
Number of obs
F( 3,
199)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
203
29.06
0.0000
0.3047
0.2942
.55125
-----------------------------------------------------------------------------LGEARN |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S|
.1825832
.0260792
7.00
0.000
.1311563
.2340102
EXP |
.0458042
.0104799
4.37
0.000
.0251382
.0664701
MALE |
.4107624
.0783771
5.24
0.000
.2562061
.5653186
_cons | -.8087749
.5058854
-1.60
0.111
-1.806359
.188809
------------------------------------------------------------------------------
C. Dougherty 2011. All rights reserved.
8
HETEROSCEDASTICITY
. reg LGEARN S EXP MALE
Source |
SS
df
MS
-------------+-----------------------------Model | 75.3997118
3 25.1332373
Residual | 138.610676
536 .258602007
-------------+-----------------------------Total | 214.010387
539 .397050811
Number of obs
F( 3,
536)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
540
97.19
0.0000
0.3523
0.3487
.50853
-----------------------------------------------------------------------------LGEARN |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S|
.1317248
.0090859
14.50
0.000
.1138765
.149573
EXP |
.0348221
.0048925
7.12
0.000
.0252112
.044433
MALE |
.3048496
.0448306
6.80
0.000
.2167845
.3929148
_cons |
.2449455
.1645445
1.49
0.137
-.0782856
.5681765
-----------------------------------------------------------------------------. predict ELGEARN, resid
. g ELGEARN2 = ELGEARN*ELGEARN
. reg ELGEARN2 S EXP MALE S2 EXP2 SEXP
Source |
SS
df
MS
-------------+-----------------------------Model | 1.75925279
6 .293208799
Residual | 112.523163
533 .211112877
-------------+-----------------------------Total | 114.282416
539 .212026746
Number of obs
F( 6,
533)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
540
1.39
0.2170
0.0154
0.0043
.45947
-----------------------------------------------------------------------------ELGEARN2 |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------S | -.0502755
.0568298
-0.88
0.377
-.1619134
.0613624
EXP | -.0375013
.0338021
-1.11
0.268
-.1039029
.0289003
MALE | -.0103873
.0408412
-0.25
0.799
-.0906169
.0698422
S2 |
.0004303
.0018432
0.23
0.815
-.0031905
.0040511
EXP2 | -.0003143
.0007424
-0.42
0.672
-.0017727
.001144
SEXP |
.0030162
.0017068
1.77
0.078
-.0003367
.0063691
_cons |
.9063623
.5036147
1.80
0.072
-.0829509
1.895675
------------------------------------------------------------------------------
7.9*
Repeat Exercise 7.6, using the GoldfeldQuandt test to investigate whether scaling by
population or by GDP, or whether running the regression in logarithmic form, would eliminate
the heteroscedasticity. Compare the results of regressions using the entire sample and the
alternative specifications.
Answer: Dividing through by population, POP, the model becomes
EDUC
1
GDP
u
1
2
,
POP
POP
POP POP
with expenditure on education per capita, denoted EDUCPOP, hypothesized to be a function of
gross domestic product per capita, GDPPOP, and the reciprocal of population, POPREC, with
no intercept. Sorting the sample by GDPPOP and running the regression for the subsamples of
14 countries with smallest and largest GDPPOP, RSS1 = 0.006788 and RSS2 = 1.415516. Now
F (12,12)
C. Dougherty 2011. All rights reserved.
RSS2 1.415516
208.5.
RSS1 0.006788
9
HETEROSCEDASTICITY
Thus the model is still subject to heteroscedasticity at the 0.1 percent level. This is evident in the
figure.
2500
EDUC/POP
2000
1500
1000
500
0
0
5000
10000
15000
20000
25000
30000
35000
40000
GDP/POP
Expenditure on education per capita and GDP per capita
Dividing through instead by GDP, the model becomes
EDUC
1
u
,
1
2
GDP
GDP
GDP
with expenditure on education as a share of gross domestic product, denoted EDUCGDP,
hypothesized to be a simple function of the reciprocal of gross domestic product, GDPREC,
with no intercept. Sorting the sample by GDPREC and running the regression for the
subsamples of 14 countries with smallest and largest GDPREC, RSS1 = 0.00413 and RSS2 =
0.00238. Since RSS2 is less than RSS1, we test for heteroscedasticity under the hypothesis that
the standard deviation of the disturbance term is inversely related to GDPREC:
F (12,12)
RSS1 0.00413
1.74.
RSS2 0.00238
The critical value of F(12,12) at the 5 percent level is 2.69, so we do not reject the null
hypothesis of homoscedasticity. Could one tell this from the figure? It is a little difficult to say.
C. Dougherty 2011. All rights reserved.
10
HETEROSCEDASTICITY
0.08
0.07
EDUC /GDP
0.06
0.05
0.04
0.03
0.02
0.01
0
0
0.00002
0.00004
0.00006
0.00008
0.0001
0.00012
1/GDP
Expenditure on education as a proportion of GDP and the reciprocal of GDP
Finally, we will consider a logarithmic specification. If the true relationship is logarithmic,
and homoscedastic, it would not be surprising that the linear model appeared heteroscedastic .
Sorting the sample by GDP, RSS1 and RSS2 are 2.733 and 3.438 for the subsamples of 14
countries with smallest and greatest GDP. The F statistic is
F (12,12)
RSS1 3.438
1.2 6
RSS2 2.733
11
10
log EDUC
9
8
7
6
5
4
8
9
10
11
12
13
14
log GDP
Expenditure on education and GDP, logarithmic
Thus again we would not reject the null hypothesis of homoscedasticity.
The third and fourth specifications both appear to be free from heteroscedasticity. How do we
choose between them? We will examine the regression results, shown for the two models with
the full sample:
. reg EDUCGDP GDPREC
Source |
SS
C. Dougherty 2011. All rights reserved.
df
MS
Number of obs =
38
11
HETEROSCEDASTICITY
---------+-----------------------------Model | .001348142
1 .001348142
Residual | .008643037
36 .000240084
---------+-----------------------------Total | .009991179
37 .000270032
F( 1,
36)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
5.62
0.0233
0.1349
0.1109
.01549
-----------------------------------------------------------------------------EDUCGDP |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------GDPREC | -234.0823
98.78309
-2.370
0.023
-434.4236
-33.74086
_cons |
.0484593
.0036696
13.205
0.000
.0410169
.0559016
-----------------------------------------------------------------------------. reg LGEE LGGDP
Source |
SS
df
MS
---------+-----------------------------Model | 51.9905508
1 51.9905508
Residual |
7.6023197
36 .211175547
---------+-----------------------------Total | 59.5928705
37 1.61061812
Number of obs
F( 1,
36)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
38
246.20
0.0000
0.8724
0.8689
.45954
-----------------------------------------------------------------------------LGEE |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------LGGDP |
1.160594
.0739673
15.691
0.000
1.010582
1.310607
_cons | -5.025204
.8152239
-6.164
0.000
-6.678554
-3.371853
------------------------------------------------------------------------------
In equation form, the first regression is
1
EDUC
= 0.048 234.1
GDP
GDP
(0.004) (98.8)
R2 = 0.13
Multiplying through by GDP, it may be rewritten
EDUC = 234.1 + 0.048GDP
It implies that expenditure on education accounts for 4.8 percent of gross domestic product at
the margin. The constant does not have any sensible interpretation. We will compare this with
the output from an OLS regression that makes no attempt to eliminate heteroscedasticity:
. reg EDUC GDP
Source |
SS
df
MS
---------+-----------------------------Model | 1.0571e+09
1 1.0571e+09
Residual | 74645819.2
36 2073494.98
---------+-----------------------------Total | 1.1317e+09
37 30586911.0
Number of obs
F( 1,
36)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
38
509.80
0.0000
0.9340
0.9322
1440.0
-----------------------------------------------------------------------------EDUC |
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------GDP |
.0480656
.0021288
22.579
0.000
.0437482
.052383
_cons | -160.4669
311.699
-0.515
0.610
-792.6219
471.688
------------------------------------------------------------------------------
The slope coefficient, 0.48, is identical to two decimal places. This is not entirely a surprise,
since heteroscedasticity does not give rise to bias and so there should be no systematic
C. Dougherty 2011. All rights reserved.
12
HETEROSCEDASTICITY
difference between the estimate from an OLS regression and that from a specification that
eliminates heteroscedasticity. Of course, it is a surprise that the estimates are so close. Generally
there would be some random difference, and of course the OLS estimate would tend to be less
accurate. In this case, the main difference is in the estimated standard error. That for the OLS
regression is actually smaller than that for the regression of EDUCGDP on GDPREC, but it is
misleading. It is incorrectly calculated and we know that, since OLS is inefficient, the true
standard error for the OLS estimate is actually larger.
The logarithmic regression in equation form is
log EDUC = 5.03 + 1.17 log GDP
(0.82) (0.07)
R2 = 0.87
implying that the elasticity of expenditure on education with regard to gross domestic product is
1.17. In substance the interpretations of the models are similar, since both imply that the
proportion of GDP allocated to education increases slowly with GDP, but the elasticity
specification seems a little more informative and probably serves as a better starting point for
further exploration. For example, it would be natural to add the logarithm of population to see if
population had an independent effect.
7.10* It was reported above that the heteroscedasticity-consistent estimate of the standard error of the
coefficient of GDP in equation (7.13) was 0.18. Explain why the corresponding standard error
in equation (7.15) ought to be lower and comment on the fact that it is not.
Answer: (7.15), unlike (7.13) appears to be free from heteroscedasticity and therefore should
provide more efficient estimates of the coefficients, reflected in lower standard errors when
computed correctly. However the sample may be too small for the heteroscedasticity-consistent
estimator to be a good guide.
7.11* A health economist plans to evaluate whether screening patients on arrival or spending extra
money on cleaning is more effective in reducing the incidence of infections by the MRSA
bacterium in hospitals. She hypothesizes the following model:
MRSAi 1 2 S i 3Ci ui
where, in hospital i, MRSA is the number of infections per thousand patients, S is expenditure
per patient on screening, and C is expenditure per patient on cleaning. ui is a disturbance term
that satisfies the usual regression model assumptions. In particular, ui is drawn from a
distribution with mean zero and constant variance 2. The researcher would like to fit the
relationship using a sample of hospitals. Unfortunately, data for individual hospitals are not
available. Instead she has to use regional data to fit
MRSAj 1 2 S j 3 C j u j
where MRSAj , S j , C j , and u j are the averages of MRSA, S, C, and u for the hospitals in
region j. There were different numbers of hospitals in the regions, there being nj hospitals in
region j.
C. Dougherty 2011. All rights reserved.
13
HETEROSCEDASTICITY
2
and that an OLS regression using the grouped
nj
Show that the variance of u j is equal to
regional data to fit the relationship will be subject to heteroscedasticity.
Assuming that the researcher knows the value of nj for each region, explain how she could
re-specify the regression model to make it homoscedastic. State the revised specification and
demonstrate mathematically that it is homoscedastic. Give an intuitive explanation of why the
revised specification should tend to produce improved estimates of the parameters.
Answer:
1
var u j var
nj
1
u jk
nj
k 1
nj
2
nj
var u jk 1
k 1
nj
2n
j
varu
jk
k 1
since the covariance terms are all 0. Hence
1
var u j
nj
2
2
n j 2
nj
To eliminate the heteroscedasticity, multiply observation j by
n j . The regression becomes
n j MRSAj 1 n j 2 n j S j 3 n j C j n j u j
The variance of the disturbance term is now
n
var n j u j
2
j
var u j n j
2
nj
2
and is thus the same for all observations.
From the expression for var u j , we see that, the larger the group, the more reliable should
be its observation (the closer its observation should tend to be to the population relationship) .
The scaling gives greater weight to the more reliable observations and the resulting estimators
should be more efficient.
C. Dougherty 2011. All rights reserved.