Spring 2007  STA 4702/5701  Midterm Exam #1 Solutions
Name:
Directions
: Please read all problems carefully. The exam is worth a total of 200 points. The
points for each problem are shown to the left. Good luck!
1.
(15)
For each of the data sets described below indicate whether the data are
nonrandom missing
,
missing at random
, or
missing completely at random
by writing
NRM, MAR, MCAR
in
the space provided.
(a) A survey was taken by telephone by paid staff members to estimate the proportion of
US voters who would consider voting for Hilary Clinton in the upcoming presidential
election. The survey was taken mostly during regular working hours. The proportion of
respondents who were women aged 25 to 35 was 85%. The estimated proportion in the
general US population for voters of this demographic is around 48%.
solution
MAR. For age and sex, the probability of being missing (missingness) doesn’t
depend on the answer. Data to reconcile the sample proportions with the population
proportions can be simulated based on the respondents who did answer.
(b) Researchers were measuring toluene concentrations in a random sample of small springs
in an area near a petroleum factory. The instrument they were using had a detection
limit of 0.001 ppb, so measurements below this concentration were censored by the
instrument, i.e., reported as “BDL” (below detection limit). Larger springs tended to
have a higher probability of high concentrations and so a lower probability of being BDL,
but to have the same probability of being sampled as smaller springs. That is, the size
proportions in the sample appeared to be representative of the size distribution in the
area.
solution
NRM. This missingness depends on both
Y
(concentration) and
X
(size of
stream).
(c) A scientist was running test tubes of DNA samples to look for the occurrence of a
rare genotype in a population of turtles in the Galapagos Islands. The test tubes were
arranged in order of sample number. The sample was a random sample from 5 islands.
While preparing the test tubes for analysis, her sleeve caught on the corner of one of the
trays and 50 tubes out of 300 were knocked over and lost.
solution
MCAR. The missingness does not depend on either
X
or
Y
.
2. Professors at a certain public university are rated by their students on a scale of 1 to 10, with
one being ‘horrible’ and 10 being ‘fantastic’. A faculty member who consistently receives low
scores believes that the ratings are determined by how difficult the course is.
She believes
that students will rate professors lower if the material is difficult. Scores of
difficulty
range
from 0 to 5, where 5 is the most difficult. Below is a regression model of 60 average faculty
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
scores for different courses by the average
difficulty
rating received. The SAS output is
shown below:
Analysis of Variance
Sum of
Mean
Source
DF
Squares
Square
F Value
Pr > F
Model
1
55.33573
55.33573
82.96
<.0001
Error
58
38.68592
0.66700
Corrected Total
59
94.02165
Root MSE
0.81670
RSquare
0.5885
Dependent Mean
3.30884
Adj RSq
0.5814
Coeff Var
24.68234
Parameter Estimates
Parameter
Standard
Variable
Label
DF
Estimate
Error
t Value
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Staff
 Regression Analysis, Clover

Click to edit the document details