This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **Inference for Regression 14.1 {a} See also the solution to Exercise 3.19. The correlation is r = [1.994, and linear regression gives a = e 3.6m + 1.19ﬁ9x. The scatterplot below shows a strong, positive, linear relation-
ship, which is conﬁrmed by r. {b} ,8 represents how much we can expect the humerus length to increase when femur length increases by 1 cm, 11 {the estimate of ,8} is 1.1969, and the estimate of n is e =
—3.645C|'. {c} The residuals are —D.3226, —U.3663, 3.13425, —ﬂ.942l3, and —i].911[l; the sum is —ﬂ.[1{lﬂl
{but carrying a different number of digits might change this]. Squaring and summing the resid-
uals gives 11.?9, so that s = Vll.'?9f3 f 1.932. 'H-IOD
DD E Hunterus length {cm}
E 3 35 41} EU 60 T0
Fernur length [cm] 14.2 {a} HEIGHT = 31.951] + [1.38333HGE]. The intercept is 91.959 and the slope is (1.38333. [b]
The estimate for I1 is the intercept of the least-squares line, that is, T1351]. The estimate for E is
the slope of the least-squares line, that is, 9.38333. to} The residuals are 13.251112, 41.349134, —D.49983, 9.351.118, 1120019, 1195112. The formula for s yields s = 1.1% = Iv.15 = 9.3378. 14.3 {a} HEIGHT = 11.54?r + U.84D42[ARMSPAH}. {b} The least-squares line is an appropriate model for the data because the residual plot shows no obvious pattern. [c] a = 11.34?r estimates
the true intercept, or; b = [1.34042 estimates the true slope, ,8. {d} s = TEES estimates or. 14.4 {a} See Exercise 3.?1 for scatterplot. r = [1.99913 and the equation of the least—squares line is it = 1.366 + 1113392341. The scatterplot shows a strong linear relationship, which is con-
firmed by r. Ilﬂ inference for Regression 21 ‘1 [131 The residuals are .111112, —.111112. —.11111. —.111119, —.111193, 1111322, and 1111392; the sum is
— 11111151 {csaentially 111. [e] a = 1 .266 Is the estimate of o; 11 = [1.6136234 15 the estimate of 153; .11111141—12 s = 2—e =‘v‘.11{|1111 111111111322: .'111191 15 the estimate of cr. 14.5 Answers will vary. 14.611a1 )1: —3. 6596 + 1. 19691:. [b1 1 = 625E, — —.1 1969211. 11251— — 15 9324 [e1df= 3; since i :4 12. 92. we know that F s: 11. 1111115. [d1 There' is very strong evidence that .13 ‘3? 11, that is, that the line
is useful for predicting the length of the humerus given the length of the femur. {e} For df— — 3,
the critical value for- a 99311 confidence interval is t = 5. 641. The interval is 1.1969 +
{5.114111121111511 or 119119 + [1.439 that a 1.1.1519 to 1.11159. 14.2 {a1 Hﬂ: 3= 11 [tl-iere' is no association between number of ]et skis m use and number of fatalities1. H:
13 1:3 11 {there is a positive association between number of iet skis 1n use and number of fatalities1. {b1 The conditions are satisfied except for having independent observations. We will proceed
with caution. {c1 LinRegTTest1T1-331 reports that t = 2.26 with df = 13. The P-valuc is 11.111111. With the ear-
lier caveat. there' 1s sufficient evidence to reject Hu and conclude that there is an association between year and number of fatalities. ﬁts the number of jet skis 1n use increases the number
of fatalities mereases. 1d} The confidence interval takes the form .11_ + t 'SE,. With 1 — —.2 11214, and SE,=.1111111111913,
the 98911 confidence interval 1s approximately [11.111111141124,11.1111111191261. 14.8 [a1 Regression of deaths on wine consumption gives 11 = —22.969. SE, = 3.332, and t = —6.46. With df = 12, we see that P s: 111111115, so we have strong evidence that ,13 s: 11 and
hence that the correlation is negative. [b1 For a 95911 conﬁdence interval with df = 12. 1“ = 2.1111. The 95911 confidence interval for
.13 is —22. 969 : [211111133521 or —22. 969 1 2.113322, that is —311.115222 to —15. 136523. 14.9 Regression of fuel consumption on speed gives 11 = —11. 111466, SE, = 11. 112334, and t = —11. 63.
With df— — 13, we see that P :e 2111.251=11.511 {software reports 11.5411, so we have no evidence to
suggesta astraight- line relationship. While the relationship between these two variables is very
strong, it is deﬁnitely not linear. See also the solution to Exercise 3.11. I— I—- [\J
U1 D U1 D Fuel used {litersi‘r 11111 km]
E 1:1 26 411 611 1311 11111 1211 1411
Speed [kmx’hr1 14.111 1a1 r2 1s very close to 1, which means but nearly all the variation in steps per second is
accounted for by foot speed. Also, the P- value for 13 is small. ._—.rI—' 212 Chapter 14 [b] B {the slope} is this rate; the estimate is listed as the Coefficient of “Speed." D.UBGZS4.
Using a .15} distribution: [1.330234 1 {4.ﬂ3ljiﬂﬂfllﬁ} = [1.13333 to [Ll-18634. 14.11 {a} The plot {below} shows a strong positive linear relationship. {is} ,E [the slope} is this rate;
the estimate is listed as the coefﬁcient of “year”: 9.31368. {c} df = 11; t‘ = 2.2111; 9.31868 t
{2.2ﬂl}[ﬂ.3ﬂ99} = 8.15366 to lﬂﬂﬂﬂﬂ. 330 300 Le an 630 TE 33 H] 34 E?
Year 14.12 [a] One residual {31.32} may be a high outlier, but the stemplot does not show any other
deviations from normality. —s 1
—2 es
—1 avssss ,
—c asawssssszn
a casa 1 {111114399 2 14 s 3 4 5 1 {b} The scatter of the data points about the regression line {see Figure 14.1} varies to a cer-
tain extent as we move along the line, but the variation-is not serious, as a residual plot
shows. The other conditions can be assumed to be satisfied. {(3)131 prediction interval would
be wider. For a fixed confidence level, the margin of error is always larger when we are pre-
dicting a single observation {a variable quantity} than when we are estimating the mean
response. {d} We are 9531:: conﬁdent that when x {crying intensity} =_- 25, the corresponding
value of y {IQ} 1will be between 91.35 and 165.33. I 14.13 {a} The major difficulty is that the observations are not independent. The number of
powerboat registrations for any year is related to the number of registrations for the previ- ous year. The other conditions can be assumed to be satisﬁed.
{b} The conﬁdence interval is [41.43, 49.59}. The prediction interval is {33.35, 53.156]. The confidence interval is more precise {i.e., narrower} since it is based on the mean of the obser—
vations, and the prediction interval is calculated for a single observation. Inference for Regression 2| '5 14.14 The number of points is so small that it is hard to iudge much from the stemplot. The scat- terplot of residuals vs. vear does not suggest an},r problems. The regression in Exercise 14.11 should
be tairlsr reliable. —[i 6
—l] 55
—[l 32
—U
I] [111
l] 22
l] 44
[I '2' Residuals T5 T3 El 84 as
Year l4.15 The scatterplot {below} shows a positive association. The regression line is
i = ll3.2 + 25.331; the linear relationship With body mass accounts for .‘r2 = 214.3% of the varia-
tion in metabolic rate. See also the solution to Exercise 3.12. Miuitab output {on the next page} reports 1) = 26.319 and SE.15 = 3.1136; with dl = 1?. the critical
value is t' = LT‘ll], so the 913% conﬁdence interval for 13 is 26.37"? i [LT‘IUHlTSE-l = 213.29 to 33.4?
calfkg. For each additional kilogram of mass, metabolic rate increases bv about 2|] to 33 calories. The residuals are listed on the next page {in order, down the columns]. A stemplot [on the next
page} suggests that the distribution of residuals is right-skewed, and the largest residual may be an
outlier. a scatterplot {on the next page} of the residuals against the explanatory,r variable gives some
hint that the variation about the line is not constant {in violation of the regression assumptions].
However, the three highest residuals account for roost of that impression [as well as the skewness
of the distribution}, so these three individuals may.r need to be examined further. a lﬁﬂU Metabolic rate {calmer}
a [EDD
lﬂﬂﬂ
350
30 35 411 45 5C! 55 6|}
Lean body mass {kg} 211': Chapter It. The regression equation is
Rate = 113 + 25.9 Mass Predictor Cocf Stdev t-Iatio p
Constant 113.2 129.6 0.63 0.9]?
Mass 20.329 3.285 2.10 0.000
s = 133.1 R—sq = 24.3% H-sq {adjl = 23.3%
Residuals
12.36 —2.32 —1 5
—132.33 —S9.35 —1 332
”38.43 4113.16 —0 BS
—1'55.24 —12F1.32 —1] 422111
—20.23 11.52 0 1112
125.93 +3966 0 6
-25.21 —10.5ti 1
23.23 358.84 1 29
13.93 65.23 2
191.85' 2
3
3 5
300
ﬁ 200
:1
72 100
E 0
-2ﬂU 30 35 4D 45 EU 55 60
Lean body mass [kg] 14.16 From the computer regression output for the years 1922—1994 {See below}, we note that the
regression equation is 9 = —35.2 + 0.1132: and s = 5.399. The critical 1value for a 90% confidence interval with Elf = 12 is t‘ = 1.240. The 90% conﬁdence interval for mean reapcnse [mean num-
ber of manatees killed) at x = 21.10 is (40.150, 46.81}. The regression equation is WTH = — 35.2 + 0.113 EEG Predictor Coef StDE‘V '1" P
Constant —35.l29 2.696 —4.52 0.000
REG 0.11269 0.01252 3-93 0.000
S = 5.399 R—Sq = 33.3% R-Sq {adj} = 92.3%
Fit StDev Fit 90.0% CI 90.0% PI 43.20 1.28 {40.60. 46.811 {33.28. 53.53] Inference for Regression 21 5 14.1?r {a} Stemplots and boxplots of the data show that both armspan and height are approxi-
mately normally distributed, with height slightlyr skewed right. It is reasonable to assume
that the data are independent observations from normal populations; that, for given
armspan, heights would be approximately normally distributed; and that the standard devi-
ation o' of heights is the same for all values of autumn. [bl 95% of the time, the prediction interval corresponding to armspan = TS inches will cap-
ture the true height. {c} The 95% conﬁdence interval corresponding to arrnspan = "IS inches predicts the mean
height for all those individuals with armspan = T5 inches. The prediction interval estab-
lishes a range for the prediction of one student with armspan = T5, while a confidence inter
val establishes a range of heights for the mean height of all students with armspan = T5.
Since averages have less variation that individual observations, the confidence interval will be shorter. 14.18 Plot below. The regression equation is s = 561.165 - 3.D?Tlx; this and the plot [below] show
that generally, the longer a child remains at the table, the fewer calories he or she will consume.
Software [output below} reports that SE, = (1.8493; to compute this by hand, ﬁrst note that the estimated standard deviation is s = 23.40 calories and that 'v’ Eb: — if can be found by multiplying the standard deviation of x {Time} by VT}. For df = 18, t' = 2.11211, so the 95% conﬁ—
dence interval is t.- : t'SE11 = —3.{}T?l : [2.lllliltlﬁ493] = —4.3625'~tp "1.291?r calories per minute. EGG
489 44c Calories consumed
t} J-
c:- E A
E 20 25 3G 35 4G 45
Time spent at table {minutes} The regression equation is
Calories = 551 — 3.i}3 Time Predictor Coef Stdev t—ratio 1"
Constant 560.65 29.3":r 19.09 0.001]
Time —1.DTTl 0.8498 —3.52 0.002
s = 23.4D R—aq = 42.1% R—sqiadj] = 39.9% 14.19 {a} Stumps {the explanatory variable} should he on the horizontal axis; the plot shows a pos-
itive linear association. See also the solution to Exercise 3.4-5. [bl The regression line is y = —l.236 + 11.3‘941. Regression on stump counts explains
83.9% of the variation in the number of beetle larvae. [c] Our hypotheses are HD: 19 = D versus H,: 13 at (I, and the test statistic is t = 19.47" [df = 21].
The output shows P = Dﬂﬂ], so we know that P s: llﬂllﬂS {as we can conﬁrm from Table C];
we have strong evidence that beaver stump counts help explain beetle larvae counts. 216 Chapter H» Stumps Hit} {a} The mean is E i ﬂﬂﬂlﬂ, and the standard deviation is s i lJJlET. For a standardized
set of values, the mean and standard deviation should be {up to rounding error] [I and 1,
respectively.' [b] The stemplot {below} doesn’t look particularly symmetric, but is not strikingly nonnor-
mal {For such a small sample}. In a set of 23 observations from a standard normal distrihur
tion, we expect most [95%) to be between —2 and 2, so —l.9’9 is quite reasonable. {c} The plot of residuals versus stump counts {below} gives no cause for concern, —1 965
—1 3!]
—ID T
—o 4422
n 0224
I] SETS?
1 2233, +3 LE .1"? l g as E a E—os 53' —1 5,—1.5 5’} —2 1 2 3 4 5
Stumps 'Note to instructors: Most students do not need to know—hut you shouldwthat the process of
finding standardized residuals is more complicated than simply ﬁnding the standard deviation of
the set of residuals and then dividing each residual by that standard deviation. [The mean of the
residuals will necessarily be D.) Standardized residuals will generally have mean and standard devi-
ation close to, but not exactly equal to, I] and 1 {respectively}. For example, with this data set, even
working with unrounded standardized residuals, the mean is Uﬂﬂl Ell]? {not 1]} and the standard
deviation is 1.0144} {not 1}. A simpler example is the data set {1, l}, [2, 1}, [3, 2], for which the stan—
dardized residuals are l, —1, 1, with mean If} and standard deviation m i 1.154? Inference for Regression 21? 14.21 {a} Below. 1.1.5. returns [the explanatory variable] should be on the horizontal axis. Since both
variables are measured in the same units, the same scale is used on both axes. See also the
solution to Exercise 3.56. {b} t = beEb = U.613Uil.2569 i 2.609. df = 25; since 2.485 s: t «c 2.232, we know that [till s: F c: 11.62 {multiply the upper-tail probabilities by 2 since the alternative hypothesis is two-sided). Thus, we have fairly strong evidence for a linear relationshiphthat is, that the
shape is nonzero. {c} When at = 15%, a: 5.683% + 13.61le = 14.95% . For estimating an individual yr
value, we use the prediction interval: — 19.65% to 49.56%. {d} The width of the prediction interval is one indication that this prediction is not practically
useful; another indication is that knowing the 1.1.5. return accounts for only about if1 = 21.4 %
of the variation in overseas returns. 21]
61.1
'50
4-0
31.1
ED
lﬂl U - Ii]
—20 —31]
—3E|'—211-1EI 1'1 11:] 21] 31] 40 1.1.5. 96 retum Overseas % return 14.22 {a} The plot of residuals (below, left} suggests that variability about the line may not be con-
stant for all values of x; it seems to increase from left to right. {b} The stemplot (below, right] suggests that the distribution of residuals is right—skewed. The
outlier is from 1936, when the overseas retum was much higher than our regression predicts. 56 "' Residuals
w :3 —2 me
E“ m —1 22432221
n m —a 4326
g ,3, a 46 —SU —'2|C|I —11} [1' 1i] 2!] 36 4-D
U.S.return —1Li 1 24563
* —str 3 9
a
5 21 3 Chapter 1 4 14.23 {a} Scatterplot below. Regression gives 171 = 1615.5 — 1.999x; the linear relationship explains
about r3 =- 23.9% of the variation in vield. {b} The t statistic for testing 1-1,]: ,9 = I] vs. H, = ,B s: I] is t = 41.92; with df = 14, the P-value is
[1.9325 {Table C tells us that 9.1125 s: P s: 11.95}. We have some evidence that weeds inﬂuence
corn yields, but it is not strong enough to meet the usual standards of statistical signiﬁcance. {c} The small value of r2 and the lack of significance of the t test indicate that this regression
has little predictive use. When it = 15, if = 159.9 hufaere; the 95% conﬁdence interval [given
by Minitah, below} is 154.4 to 165.3 hufacre. [Up to rounding error, this agrees with the
"hand-computed” value, with t' = 2.145 and SE; = 2.54: 159.9 : {2.145} {2.541.} The width
of this interval is another indication that the model has little practical use. I—ll—IHF—lzl—l
mv‘l 1
can-1331c} Corn vield {bushelsfacre} _i
.12.
'2‘ ﬂ 1 2 3 4 5 d T 3 9
Wﬂeds per meter The regression equation 1.3
Corn = 166 — 1.19 WEede Predicton Coef Stdev t.—rat1o P
Constant 166.493 2.925 51.11 0.900
Weeds -1.D9B? 0.5712 —l.92 9.075
a = 9.97? R—eq = 20.9% R-Sq [adj] = 15.3% ---------------------------- {output continues]---------------—~----------- Fit Sedev. Fit 95.3% C.1. 95.0% F.1.
159.39 2.54 {154.44, 165.34] [141.93, 199.35] 14.24 df = 21, so i' = 1321; the 91391: confidence interval is b I t'SE, = —9.ﬁ949 : {1.721} (1.333?) — - 12.9454 to 45.4444 hpm per minute. With 91195 confidence, we can say that for each l‘minute
increase in swimming time, pulse rate drops by 15 to 13 hpm. 14.25 {a} The regression line is 9 = 4719.9 - 9.o949x; with x = 34.3 minutes, this agrees with the
output {up to rounding}. The prediction interval is appropriate for estimating one value [as
opposed to the mean of man}.r values}: 155.19 to 159.111 hpm. {b} Using df = 21, we ﬁnd t’ = 1.221; this would give the interval 149.49 : (1.921“ 1.9?) =
144.91 to 1511.99, which agrees with the computer output {up to rounding error]. 14.26 {a} Perch #143 lies slightly above the overall linear pattern, but does not appear to he too far out of place on the width versus length plot [see next page}. Since both variables are meas-
ured in centimeters, the same scale is used on both axes. Inference for Regression 219 [b] Regression [see Minitab output below) gives if = —D.6522 + (1.13233: em. {C} When I = 2?, j“? E 4.2? em. We have df = 54, so we use 5U degrees of freedom in Table
C. which gives 15 = 2.0139. The 95% Conﬁdence interval for p... is 4.2T?r i {2.UU'3HUJ3552] =
4.16 to 4.33 em. {-21} A sternplot of the residuals (below. left] reveals a high outlier [which came from perch
#143] and a less extreme low value [from fish #149, which is unusallj' slender for its width}.
A plot of residuals versus length (below, right} suggests that there mav be more variabili—
tv in Width for larger lengths {although much of this impression mav be due to the two
extreme residuals and the fact that we have fewer observations For small fish}. These two
issues might give us reason to he hesitant about using inference procedures. .1
-D 3
-{1 1.5
-D 5544 m 1
-{3 33333222222222 7:3!
—e iiiiiiiiiioooo g ”5
{l ﬂﬂﬂlllll g u
s 223333 _D5
'0 4444455 '
ﬂ 6 -.1
[3 [El 15 211 25 ‘30 35 4D 45
1 Length [em]
1
l
l 6
The regression equation 13
width = - 0.652 + 4.182 length
Predictor Coef Stdev t—ratio p
Constant —D.5522 0.1251 —3.T2 D.GOD
length 0.182322 0.G05642 32.32 D.UDD e = 0.3333 12-53:; = 35.1% a-sqiaoji = 35.03 ---------------------------- {output ountinues}----------------------——~--- Fit Stdev.Fit .95.U% C.l. 95.ﬂ% P.I.
4.2705 G.ﬂ552 [4.1592, 4.3312] {3.4633, 5.0???) 22o Chapter1s 14.27!r {a} The plot (below, left} shows a fairly strong curved pattern {weight increases with
length}. Two points are circled; these fish—#143 and #149, the two fish noted in the pre-
vious problem—stray the most from the curve, but probably would not be considered
outliers. _ I {b} We would expect weight to increase roughly linearly with volume {if we double vol-
time, we double weight; if we triple the volume, we triple the weight, etc}. 1|When all
dimensions (length, width, and height} are doubled, the volume of an object increases
by a factor of B = 23. Similarly, if we triple all dimensions, volume (and, approximat-
ely, weight} increases by a factor of 2? = 35. it then makes sense that the cube root {i.e., the one-third power} of the weight increases at an approximately linear rate with
length. _ [c] The second plot {below, right} also shows a strong positive association, but this is much
'5 more linear. There are no particular outliers. ' [d] The correlations reﬂect the increased linearity of the second plot: Using the original
weight variable, r2 = 3.92M; with weight“, r3 = [1.935]. {e} Regression gives 1“: = —ﬂ.3233 + 323331;? = 5.9623 when x = 2? cm. We have df= 54,
so we use 542} degrees of freedom in Table C, which gives t‘ = 2.1309. The 95% confidence
interval for ,u, is soon : [2.00‘3‘Iﬂﬂ382} = 5.3345 to [5.065 g“. {f} The stemplot shows no gross violations of the assumptions except for the high outlier for
fish #143. As we saw in the previous problem, the scatterplot suggests that variability in
weight may be greater for larger lengths. Neither of these violations is too severe, but
together they might be a cause for concern. However, dropping fish #143 [which changes the regression line only slightly, to y = 43.2925 + 13.23131} seems to alleviate both these
problems [to some degree, at least}, so regression should be safe. 1
1mm 3
3 am ~:._" g
3-; son 33;, 5
IL]
3 4““ s i
zoo 3
2
o i
5 if) 1.5 20 25 3D 35 4f] 45 5 1D 1'5 2U 25 3t] 35 still 45
Length {cm} Length [cm]
—fl 6
—fl 5
—U 33322222 .3:
—ﬂ lllllllllﬂﬂﬂDﬂ-ﬂﬂﬂﬂ E
o ooooooooooooooooooi E
o 2233 a
o 4444
U
D ' _
1 U 5 1o 15 so as so 35 4o 45 Length [cm] Inference for Regression 22] The regreseLDn equation is cuberoot = - 0.328 + 0.233 length Predictor Coef Stdev t—ratio P
Constant *0.3283 0.1211 —2.T] 0.089
Eength 0.232988 0.003902 59.22 0.000 S = 0.275? R—Sq = 98.5% R—eqtadj} = 98.5%
-------------------------- {output coutinues}--------ur------------- Fit Stdev.Fit 95.0% C.I. 95.0% P_I.
5.9523 0.0382 (5.885?J 6.0389] (5.4041. 6.5205] ...

View
Full Document