Midterm 1 Practice Problems 2 With Solutions
(1) Regress a Wreck
A statistician is trying to learn what factors affect the price of a used car.
Her Y variable is the price of the car. She is considering several possible predictor variables. They
are
X
1
, the original value of the car,
X
2
, the mileage on the car,
X
3
, the number of repairs that
have been done on the car, and
X
4
, the number of seat belts in the car.
(a)
For each of the four possible predictor variables the statistician has obtained the correlation of
Y and X, and the covariance of Y and X.
Cor
(
Y, X
1
) =
.
795
Cov
(
Y, X
1
) = 3
,
688
,
147
Cor
(
Y, X
2
) =
−
.
789
Cov
(
Y, X
2
) =
−
149
.
155
Cor
(
Y, X
3
) =
−
.
539
Cov
(
Y, X
3
) =
−
1186
.
4
Cor
(
Y, X
4
) =
−
.
004
Cov
(
Y, X
4
) =
−
7
.
6
Say what a plot of Y vs X should look like in each case.
Solution:
Variable
X
1
(original value) has a strong positive correlation with Y so the plot should
show a clear upward trend. Variable
X
4
(number of seat-belts) has a correlation with Y that is
close to 0 and so the plot should be nearly flat– i.e. not showing a clear relationship between X
and Y. Variable
X
2
(mileage) has the stronger of the two negative correlations (closer to -1) so the
plot should show the stronger of the two downward trends. The points would less spread out about
the line than in the plot for
X
3
, number of repairs.
(b)
Rank the variables
X
1
, X
2
, X
3
, X
4
in terms of how good a job you expect them to do of pre-
dicting Y based on the values given in part (a) (NOT on your common sense opinion!) Order them
from best predictor to worst predictor and briefly explain your reasoning.
Solution:
The strength of the relationship is determined by the correlation. (Note: The covariance
is not good for comparing strengths of relationships because different units can affect what is a
“big” covariance!) The sign is irrelevant to the strength of the relationship–it only determines the
direction of the relationship. Here original price,
X
1
, has the highest correlation in absolute value
at .795, followed by mileage,
X
2
,
at -.789, repairs,
X
3
, at -.539, and seat-belts,
X
4
, at -.004. The
stronger the relationship, the better a predictor the variable will be. Therefore original value will
be the best predictor followed by mileage, number of repairs, and number of seat-belts.
(c)
To simplify matters the statistician has fit two regressions, one of price (Y) on mileage (
X
2
) and
one of price (Y) on the number of repairs the car has had (
X
3
). Printouts for these regressions are
given on the following page. Give three numbers from the printouts that tell you which predictor,
X
2
or
X
3
is doing a better job and briefly explain why that number tells you it is doing a better job.
Do the numbers confirm your prediction from part (b)? (Note: Comparing R-squared from print-
out 1 to R-squared from printout 2 counts as one number. You should give three pairs/comparisons.)
Solution:
Mileage,
X
2
is the better predictor.
We can see this using almost any number from
the printout that follows. For instance,
R
2
= 63
.
7% for mileage and only 29.1% for repairs–and
1