(The next 4 questions are based on the following information.)
In this problem we consider an analysis of the number of active physicians in a city as a function of the
city's population and the region of the United States that the city is in.
The sample consists of data on 141
cities in the United States, and the variables are defined as follows
•
pop
= city population (in thousands)
•
doctors
= number of professionally active physicians in the city
•
region dummy variables are defined for 4 regions: East, Central, South, and West.
east
= 1 if city in the East region, 0 otherwise
central
= 1 if city in the Central region, 0 otherwise
south
= 1 if city in the South region, 0 otherwise
R
2
= .9551
R
2
(Adjusted) = .9538
SSE=56,950,000
Residual SD
= 647.1
Coefficients
Standard
Error
t Stat
P-Value
Intercept
-255
130.2
-1.96
0.052
pop
2.3
0.043
53.5
0.000
east
-36
174.7
-0.21
0.836
central
-327
164.1
-1.99
0.048
south
-83
152.8
-0.54
0.590
1.
What is the predicted number of doctors in a city
with a population of 500,000 in the West region?
(a) 568
(b) 823
(c) 895
(d) 1150
Answer: (c)
predicted doctors = -255 + 2.3 * 500 – 36*0 – 327*0 –83*0
= 895
2.
What is the equation for the regression line predicting number of doctors from population for the
South region?
(a) Doctors = -338 + 2.3 pop
(b) Doctors = -255 + 83 pop
(c) Doctors = -172 + 2.3 pop
(d) Doctors = -255 + 2.3 pop
Answer: (a)
Predicted Doctors = -255 + 2.3 pop - 36*0 - 327*0 - 83*1 = -338 + 2.3 pop
3.
Based on this model, in which region is the slope of the regression line relating doctors to population
the steepest
?
(a) The West region has the steepest regression line.
(b) The Central region has the steepest regression line.
(c) The East region has the steepest regression line.
(d) The regression line has the same slope in all four regions.

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
The slope for the variable “pop” is the same for all four regions – this is a central assumption of a
model with a continuous predictor and a set of dummy variables.
Only if we add interaction terms
can the slope be different in each region.
4. To test whether the whole model is at all useful, we perform a hypothesis test of whether the population
coefficients for the four independent variables (pop, east, central, and south) are all equal to 0
.
What is
the test statistic, approximate critical value, and conclusion for this hypothesis test?
(Use alpha=.05)
(a) test statistic: F = 723
approximate critical value: F* = 2.45
Conclusion: Reject H
0
(b) test statistic: t = 53.5
approximate critical value: t* = 1.98
Conclusion: Reject H
0
(c) test statistic: F = 2862
approximate critical value: F* = 3.92
Conclusion: Don't Reject H
0
(d) test statistic: t = -0.54
approximate critical value: t* = 1.98
Conclusion: Don't Reject H
0
Answer: (a)
Test statistic F = (R
2 /
k) / [(1-R
2
) / (n-k-1)] = (.9551 / 4) / (.0449 / (141- 4 –1)) = 723
Want critical F from F-table , using 4 and 136 df:
closest is F(4,60120) = 2.447
Because the test statistic is larger than the critical value we reject the null hypothesis.
(btw, you won’t need to use the F table on the exams)

This is the end of the preview.
Sign up
to
access the rest of the document.