PUBH 7430
Assignment 1
Solutions
J. Wolfson
September 29, 2011
On this assignment, as in all homeworks,
UNEDITED COMPUTER OUTPUT IS
NOT ACCEPTABLE
. Tables and plots prepared from statistical output should be appro
priate for inclusion in a scientific report, with all statistics rounded to a reasonable number
of significant digits.
Background
Each year the U.S. Naval Postgraduate School sets aside a “Discovery Day” during which the
general public is invited into their laboratories. This dataset is from October 21st 1995, when
visitors could test their reaction times and handeye coordination in the Human Systems
Integration Laboratory.
The variable of interest, “anticipatory timing”, was measured by
a Bassin timer, which measures a person’s ability to estimate the speed of a moving light
and its arrival at a designated point. The timer consists of a 10 foot row of lights which is
controlled by a variable speed potentiometer. The lights are switched on sequentially from
one end to the other so that light “travels” at 5 miles per hour down the timer. Each visitor
was instructed to anticipate the “arrival” of the light at one end of the timer and at that time
to swing a plastic bat across a light beam at the same end of the timer. An automatic timing
device measured the difference between the breaking of the beam and the actual arrival of
the light. In the original data, a negative time value for a trial indicated that the bat broke
1
the beam before the light actually arrived; in the version provided to you, all times have
been transformed to positive values, so that the values reflect the magnitude of how far off
the participant was in timing. Each of 113 visitors completed the trial five times. Age and
gender were also recorded, since the researchers were interested in age and gender differences
in reaction times. Visitors tended to come in family groups, but that information was not
recorded.
You can find these data from the file
timetrial.dat
from the class webpage. These data are
organized in wide format, with one row per person and one column for each of the five trials.
Depending on what software you use for graphing, they may need to be reshaped to long
format, i.e. to have one row per trial instead.
1.
Notation.
Consider the notation presented in Lecture 3, where outcomes are denoted
by
Y
and covariates/predictors by
X
.
Suppose we are interested in modeling the
response times in this dataset as a function of age, gender, and trial number, via the
simple linear model
Y
=
Xβ
+
For this model as applied to the timetrial data:
(a) Let
Y
i
denote the response vector for each individual. What is the length of
Y
i
?
5
(b) What is the length of
Y
? 113
×
5
=
565
(c) Let
X
i
denote the vector of covariates/predictors for each individual. What is
the dimension of
X
i
? 5
×
3, 5
×
4 if you include the column for the intercept.
