MS&E 226
Solutions for Take-Home Midterm Examination
“Small” Data
PROBLEM 1.
A medical researcher wants to investigate the efficacy of a nutritional supplement
designed to help athletes gain muscle.
The researcher recruits athletes
i
= 1
, .., n
(where
n
is even), none of whom are taking the
supplement at the beginning of the experiment. Once the experiment starts, some athletes start
taking the supplement, while others do not. If athlete
i
takes the supplement, this is denoted by
X
i
= 1
; and if athlete
i
does not take the supplement, this is denoted by
X
i
= 0
. For each athlete
i
, the researcher measures the change in the lean body mass of the athlete from the beginning of
the experiment to 3 months later; this difference is denoted by
Y
i
.
The researcher also knows that muscle gain depends on the the number of hours the athlete
exercises per week, which we denote by
Z
i
for athlete
i
. Indeed, in the population, given
X
and
Z
, the corresponding
Y
is distributed as:
Y
=
β
0
+
β
1
X
+
β
2
Z
+
ε,
(1)
where
ε
is normal
(0
,
1)
, and independent across athletes.
Unfortunately, the
Z
i
’s are not available to the researcher. Her primary goal is to establish
whether an athlete will gain more muscle on average if she uses the supplement, even if she does
not change her exercise routine.
a) Interpret the coefficient
β
1
, and explain why this is the quantity of interest to the researcher.
Since the researcher does not have the
Z
i
’s, she decides to go ahead with a simple regression
of
Y
on
X
, with an intercept term; i.e., she estimates the linear model:
Y
i
≈
ˆ
β
0
+
ˆ
β
1
X
i
,
using ordinary least squares (OLS).
Our goal is to develop some understanding of whether this is a good idea.
Before you begin the next part in R, set the seed of the random number generator with the
following code:
set.seed(1)
b) Assume that
β
0
= 0
, β
1
= 1
, β
2
= 1
.
First, assume the
Z
i
are i.i.d. normal
(5
,
1)
random variables; and also assume that whenever
Z
i
>
5
, then
X
i
= 1
; and when
Z
i
<
5
, then
X
i
= 0
. Repeat the following 1000 times:
•
Draw
n
= 500
random samples of
Z
,
X
, and
Y
according to the preceding description.
•
Run OLS of
Y
against
X
, and record
ˆ
β
1
.