Stat 500
Midterm 2 — 12 November 2009
page 0 of 11
Please put your name on the back of your answer book
.
Do NOT
put it on the front. Thanks.
Do not start until I tell you to.
•
The exam is closed book, closed notes. Use only the formula sheet and tables I provide today.
You may use a calculator.
•
Write your answers in your blue book. Ask if you need a second (or third) blue book.
•
You have 2:15 hours (135 minutes) to complete the exam.
Stop working when the end of the exam is announced.
•
Points are indicated for each question. There are 130 total points.
•
Important reminders:
–
budget your time. Some parts of each question should be easy; others may be hard. Make
sure you do all parts you can.
–
notice that some parts do not require any computations.
–
show your work neatly so you can receive partial credit.
•
Good luck!
Stat 500
Midterm 2 – 12 November 2009
page 1 of 11
1. 50 points. Health of factory workers. The following data were collected in a study of the health
of paint sprayers in an auto assembly plant. Two of the variables that were measured on each
of the 103 workers in the study were H, the haemoglobin concentration, and L, the lymphocyte
count. These are measures of two different components of the blood.
The following quantities may help you answer the questions:
The observed intercept and slope in the regression
H
i
=
β
0
+
β
1
L
i
+
i
are
b
0
=

55
.
6,
b
1
= 1
.
98
The estimated s.d. of observations around the line is
s
e
= 4.95
The error SS for the “intercept only” model
H
i
=
μ
+
i
is 8050.2
The error SS for the regression
H
i
=
β
0
+
β
1
L
i
+
i
is 2474.2
The error SS for the regression
H
i
=
β
0
+
β
1
L
i
+
β
2
L
2
i
+
i
is 2470.8
The error SS for the regression
H
i
=
β
0
+
β
1
L
i
+
β
2
L
2
i
+
β
3
L
3
i
+
i
is 2396.4
The error SS for the loess regression
H
i
=
f
(
L
i
) +
i
is 2391.4 with 97.5 d.f.
The mean lymphocyte count is 30.9.
The sumofsquares of lymphocyte counts,
∑
(
x
i

x
)
2
, = 1428.
The correlation coefficient between H and L is 0.838.
(a) What statistic is the most appropriate to describe the association between haemoglobin
concentration and lymphocyte count? You may answer with one of the values I’ve provided,
or some other statistic. Briefly explain why you chose your statistic.
No matter how you answered the previous question, the investigators want you to fit the regres
sion:
H
i
=
β
0
+
β
1
L
i
+
i
.
(b) Calculate the s.e. of
b
1
(c) Test H0:
β
1
= 0. Report your test statistic and twosided pvalue.
Note: If you were not able to do the previous question and need a s.e. for your test, use
s.e.= 0.49.
(d) The usual ANOVA table for this regression has rows and columns labelled:
Source
d.f.
SS
MS
F
Model
??
??
??
??
Error
??
??
??
Total
??
??
Calculate as many of the missing entries as you can from the available data and what you
know about the study.
(e) The investigators use the fitted regression to predict average haemoglobin concentration at
three possible lymphocyte counts:
L
i
= 26,
L
i
= 32, and
L
i
= 35. Which prediction is the
most precise? Explain your choice.
Stat 500
Midterm 2 – 12 November 2009
page 2 of 11
(f) Here are a residual plot and a normal quantilequantile plot for the fitted regression. List