#### You've reached the end of your free preview.

Want to read all 418 pages?

**Unformatted text preview: **Chapter 1 Solutions
1.1. Most students will prefer to work in seconds, to avoid having to work with decimals or
fractions.
1.2. Who? The individuals in the data set are students in a statistics class. What? There are
eight variables: ID (a label, with no units); Exam1, Exam2, Homework, Final, and Project
(in units in “points,” scaled from 0 to 100); TotalPoints (in points, computed from the other
scores, on a scale of 0 to 900); and Grade (A, B, C, D, and E). Why? The primary purpose
of the data is to assign grades to the students in this class, and (presumably) the variables
are appropriate for this purpose. (The data might also be useful for other purposes.)
1.3. Exam1 = 79, Exam2 = 88, Final = 88.
1.4. For this student, TotalPoints = 2 · 86 + 2 · 82 + 3 · 77 + 2 · 90 + 80 = 827, so the grade is B.
1.5. The cases are apartments. There are ﬁve variables: rent (quantitative), cable (categorical),
pets (categorical), bedrooms (quantitative), distance to campus (quantitative).
1.6. (a) To ﬁnd injuries per worker, divide the rates in Example 1.6 by 100,000 (or, redo the
computations without multiplying by 100,000). For wage and salary workers, there are
0.000034 fatal injuries per worker. For self-employed workers, there are 0.000099 fatal
injuries per worker. (b) These rates are 1/10 the size of those in Example 1.6, or 10,000
times larger than those in part (a): 0.34 fatal injuries per 10,000 wage/salary workers, and
0.99 fatal injuries per 10,000 self-employed workers. (c) The rates in Example 1.6 would
probably be more easily understood by most people, because numbers like 3.4 and 9.9 feel
more “familiar.” (It might be even better to give rates per million worker: 34 and 99.)
1.7. Shown are two possible stemplots; the ﬁrst uses split
stems (described on page 11 of the text). The scores are
slightly left-skewed; most range from 70 to the low 90s. 5
6
6
7
7
8
8
9
9 58
0
58
0023
5558
00003
5557
0002233
8 5
6
7
8
9 58
058
00235558
000035557
00022338 1.8. Preferences will vary. However, the stemplot in Figure 1.8 shows a bit more detail, which
is useful for comparing the two distributions.
1.9. (a) The stemplot of the altered data is shown on the right. (b) Blank stems
should always be retained (except at the beginning or end of the stemplot),
because the gap in the distribution is an important piece of information about
the data. 53 1
2
2
3
3
4
4
5 6
5568
34
55678
012233
8
1 1.10. Student preferences will vary. The stemplot
has the advantage of showing each individual
score. Note that this histogram has the same
shape as the second histogram in Exercise 1.7. Chapter 1 Frequency 54 9
8
7
6
5
4
3
2
1
0
50 Frequency 1.11. Student preferences may vary, but the
larger classes in this histogram hide a lot of
detail. Looking at Data—Distributions 60 90 100 18
16
14
12
10
8
6
4
2
0
40 60
80
First exam scores 100 7
6
Frequency 1.12. This histogram shows more details about
the distribution (perhaps more detail than
is useful). Note that this histogram has the
same shape as the ﬁrst histogram in the solution to Exercise 1.7. 70
80
First exam scores 5
4
3
2
1
0
55 60 65 70 75 80 85 90
First exam scores 95 100 1.13. Using either a stemplot or histogram, we see that the distribution is left-skewed, centered
near 80, and spread from 55 to 98. (Of course, a histogram would not show the exact values
of the maximum and minimum.)
1.14. (a) The cases are the individual employees. (b) The ﬁrst four (employee identiﬁcation
number, last name, ﬁrst name, and middle initial) are labels. Department and education level
are categorical variables; number of years with the company, salary, and age are quantitative
variables. (c) Column headings in student spreadsheets will vary, as will sample cases.
1.15. A Web search for “city rankings” or “best cities” will yield lots of ideas, such as crime
rates, income, cost of living, entertainment and cultural activities, taxes, climate, and school
system quality. (Students should be encouraged to think carefully about how some of these
might be quantitatively measured.) Solutions 55 1.16. Recall that categorical variables place individuals into groups or categories, while
quantitative variables “take numerical values for which arithmetic operations. . . make sense.”
Variables (a), (d), and (e)—age, amount spent on food, and height—are quantitative. The
answers to the other three questions—about dancing, musical instruments, and broccoli—are
categorical variables.
1.18. Student answers will vary. A Web search for “college ranking methodology” gives
some ideas; in recent year, U.S. News and World Report used “16 measures of academic
excellence,” including academic reputation (measured by surveying college and university
administrators), retention rate, graduation rate, class sizes, faculty salaries, student-faculty
ratio, percentage of faculty with highest degree in their ﬁelds, quality of entering students
(ACT/SAT scores, high school class rank, enrollment-to-admission ratio), ﬁnancial resources,
and the percentage of alumni who give to the school. brown gray white red black blue yellow orange black red purple green 40
35
30
25
20
15
10
5
0
blue Percent 1.19. For example, blue is by far the most popular choice; 70% of respondents chose 3 of the
10 options (blue, green, and purple). Favorite color
30
25
Percent 1.20. For example, opinions about least-favorite
color are somewhat more varied than favorite
colors. Interestingly, purple is liked and disliked by about the same fractions of people. 20
15
10
5
white green gray yellow purple brown orange 0 Least favorite color 1.21. (a) There were 232 total respondents. The table that follows gives the percents; for
10 .
= 4.31%. (b) The bar graph is on the following page. (c) For example, 87.5%
example,
232
of the group were between 19 and 50. (d) The age-group classes do not have equal width:
The ﬁrst is 18 years wide, the second is 6 years wide, the third is 11 years wide, etc.
Note: In order to produce a histogram from the given data, the bar for the ﬁrst age
group would have to be three times as wide as the second bar, the third bar would have to
be wider than the second bar by a factor of 11/6, etc. Additionally, if we change a bar’s 56 Chapter 1 Looking at Data—Distributions width by a factor of x, we would need to change that bar’s height by a factor of 1/x. 70 and over 51 to 69 36 to 50 25 to 35 1 to 18 19 to 24 Percent
4.31%
41.81%
30.17%
15.52%
6.03%
2.16% Percent Age group
(years)
1 to 18
19 to 24
25 to 35
36 to 50
51 to 69
70 and over 40
35
30
25
20
15
10
5
0 Age group (years) 1.22. (a) & (b) The bar graph and pie charts are shown below. (c) A clear majority (76%)
agree or strongly agree that they browse more with the iPhone than with their previous
phone. (d) Student preferences will vary. Some might prefer the pie chart because it is more
familiar.
Strongly
disagree Response percent 50
40
30 Mildly
disagree 20 Strongly
agree Mildly
agree 10
0
Strongly
disagree
25
Replacement percent 20
15
10
5 W Previous phone model g
thi
n r he Ot No ian k mb kic Sy de
Si ry
er
kB lm Pa Bl ow ind e bil o
sM ac zr 0
Ra 1.23. Ordering bars by decreasing height shows
the models most affected by iPhone sales.
However, because “other phone” and ”replaced nothing” are different than the other
categories, it makes sense to place those two
bars last (in any order). ola Mildly
disagree tor Mildly
agree Mo Strongly
agree Solutions 57 10 Paper Metals 5 Other Metals 15 Glass Food scraps 20 Wood 25 Glass Other
Wood
Rubber, leather,
textile Rubber, leather, textiles Paper, paperboard Plastics 30 Yard trimmings Percent of total waste 1.24. (a) The weights add to 254.2 million tons, and the percents add to 99.9.
(b) & (c) The bar graph and pie chart are shown below. Plastics Yard trimmings
Food scraps 0
Source 60 60 50 50 Percent recycled 40
30
20
10
0 30
20
10
0
r pe s
ng s im mi O Pa s tal Me Tr mi im W r
the Tr Ru d
oo ng r e
bb s
tic as Material 1.26. (a) The bar graph is shown on
the right. (b) The graph clearly illustrates the dominance of Google; its
bar dwarfs those of the other search
engines. s as Gl r be b
Ru Material Market share (%) G r s
tal ape
P
Me Pl s
las ps
ra
sc
od
Fo 40 d Pl
Fo asti
od cs
sc
ra
ps Percent recycled 1.25. (a) & (b) Both bar graphs are shown below. (c) The ordered bars in the graph from (b)
make it easier to identify those materials that are frequently recycled and those that are not.
(d) Each percent represents part of a different whole. (For example, 2.6% of food scraps are
recycled; 23.7% of glass is recycled, etc.) oo W r he Ot 80
70
60
50
40
30
20
10
0
Google Yahoo MSN AOL Microsoft Ask
Live
Search engine Other 58 Chapter 1 Looking at Data—Distributions Percent of all spam 1.27. The two bar graphs are shown below.
20 20 15 15 10 10 5 5 0 0
Adult Financial Health Leisure Products Scams Products Financial Adult Scams Leisure Health Type of spam Type of spam 10
8
6
4
2
rk
Au ey
str
a
Co lia
lom
bia
Ch
ile
Fra
nc
No e
rw
a
Sw y
ed
en
Me
Ve xico
ne
So zue
uth la
A
Ho frica
ng
Ko
ng
Eg
De ypt
nm
ark
Sp
ain
Ind
Ge ia
rm
an
y
Isr
ae
l
Ita
ly Tu do na
Ca ing
dK
Un ite da 0
m Facebook users (millions) 1.28. (a) The bar graph is below. (b) The number of Facebook users trails off rapidly after the
top seven or so. (Of course, this is due in part to the variation in the populations of these
countries. For example, that Norway has nearly half as many Facebook users as France is
remarkable, because the 2008 populations of France and Norway were about 62.3 million
and 4.8 million, respectively.) Country 1.29. (a) Most countries had moderate (single- or double-digit) increases in Facebook usages. Chile (2197%) is an extreme outlier, as are (maybe) Venezuela
(683%) and Colombia (246%). (b) In the stemplot on the right, Chile and
Venezuela have been omitted, and stems are split ﬁve ways. (c) One observation is that, even without the outliers, the distribution is right-skewed. (d) The
stemplot can show some of the detail of the low part of the distribution, if the
outliers are omitted. 0
0
0
0
0
1
1
1
1
1
2
2
2 000
2333
4444
6
99
33 4 59 70
60
50
40
30
20
10
Theology M.B.A. M.D. Law Other M.S. Other Ph.D. Ed.D. Other M.A. 0
M.Ed. 1.30. (a) The given percentages refer to nine
distinct groups (all M.B.A. degrees, all
M.Ed. degrees, and so on) rather than one
single group. (b) Bar graph shown on the
right. Bars are ordered by height, as suggested by the text; students may forget to do
this or might arrange in the opposite order
(smallest to largest). Degrees earned by women (%) Solutions Yel low Oth er ld d /go Re e
Blu y ite
Wh rl Gra er pea ite Wh Silv Bla ck 0 Color 25
20
15
10
5 d
/go
l rl low ite d ite
Wh pea Yel Color Re e
Gra
y
Bla
ck 0
Blu ld er
Oth d /go Re low 10 Wh Color Yel Blu
e y ite
Wh rl Gra er pea ite Wh Silv Bla ck 0 15 er 5 Intermediate cars Oth 10 20 er 15 Luxury cars Silv 20 25 5 Percent of intermediate cars Percent of luxury cars 1.31. (a) The luxury car bar graph is below
on the left; bars are in decreasing order of
size (the order given in the table). (b) The
intermediate car bar graph is below on the
right. For this stand-alone graph, it seemed
appropriate to re-order the bars by decreasing
size. Students may leave the bars in the order
given in the table; this (admittedly) might
make comparison of the two graphs simpler.
(c) The graph on the right is one possible
choice for comparing the two types of cars:
for each color, we have one bar for each car
type. Percent Graduate degree 1.32. This distribution is skewed to the right, meaning that Shakespeare’s plays contain many
short words (up to six letters) and fewer very long words. We would probably expect most
authors to have skewed distributions, although the exact shape and spread will vary. 60 Chapter 1 Looking at Data—Distributions 1.33. Shown is the stemplot; as the text suggests, we have trimmed numbers (dropped the last digit) and split stems. 359 mg/dl appears to be
an outlier. Overall, glucose levels are not under control: Only 4 of the
18 had levels in the desired range. 1.34. The back-to-back stemplot on the right suggests that the
individual-instruction group was more consistent (their numbers have less spread) but not more successful (only two had
numbers in the desired range). 0
1
1
2
2
3
3 Individual
22
99866655
22222
8 0
1
1
2
2
3
3 799
0134444
5577
0
57
5
Class
799
0134444
5577
0
57
5 1.35. The distribution is roughly symmetric, centered near 7 (or “between 6 and 7”), and
spread from 2 to 13.
1.36. (a) Totals emissions would almost certainly be higher for
0 00000000000000011111
0 222233333
very large countries; for example, we would expect that even
0 445
with great attempts to control emissions, China (with over
0 6677
1 billion people) would have higher total emissions than the
0 888999
1 001
smallest countries in the data set. (b) A stemplot is shown; a
1
histogram would also be appropriate. We see a strong right
1
skew with a peak from 0 to 0.2 metric tons per person and a
1 67
smaller peak from 0.8 to 1. The three highest countries (the
1 9
United States, Canada, and Australia) appear to be outliers;
apart from those countries, the distribution is spread from 0 to 11 metric tons per person.
1.37. To display the
0 000000000000000000000000000000000000011111111111111111111
0 2222222222222222233333333333333333333333
distribution, use
0 444444444444444444445555555555555555555
either a stemplot
0 666666666666666666667777777777777
or a histogram. DT
0 888888888888888999999999999999999
1 000000000000111111111
scores are skewed to
1 22222222222233333333333
the right, centered
1 444444455
near 5 or 6, spread
1 66666777
from 0 to 18. There
1 8
are no outliers. We
might also note that only 11 of these 264 women (about 4%) scored 15 or higher. Solutions 61 Frequency 1.38. (a) The ﬁrst histogram shows two modes: 5–5.2 and 5.6–5.8. (b) The second histogram
has peaks in locations close to those of the ﬁrst, but these peaks are much less pronounced,
so they would usually be viewed as distinct modes. (c) The results will vary with the
software used.
18
16
14
12
10
8
6
4
2
0
4.2 4.6 5 5.4
5.8
6.2
Rainwater pH 6.6 7 18
16
14
12
10
8
6
4
2
0
4.14 4.54 4.94 5.34 5.74 6.14
Rainwater pH 6.54 6.94 1.39. Graph (a) is studying time (Question 4); it is reasonable to expect this to be right-skewed
(many students study little or not at all; a few study longer).
Graph (d) is the histogram of student heights (Question 3): One would expect a fair
amount of variation but no particular skewness to such a distribution.
The other two graphs are (b) handedness and (c) gender—unless this was a particularly
unusual class! We would expect that right-handed students should outnumber lefties
substantially. (Roughly 10 to 15% of the population as a whole is left-handed.)
1.40. Sketches will vary. The distribution of coin years would be left-skewed because newer
coins are more common than older coins.
Women
Men
1.41. (a) Not only are most responses multiples of 10;
0 033334
many are multiples of 30 and 60. Most people will
96 0 66679999
“round” their answers when asked to give an estimate
22222221 1 2222222
888888888875555 1 558
like this; in fact, the most striking answers are ones
4440 2 00344
such as 115, 170, or 230. The students who claimed 360
2
3 0
minutes (6 hours) and 300 minutes (5 hours) may have
6 3
been exaggerating. (Some students might also “consider
suspicious” the student who claimed to study 0 minutes per night. As a teacher, I can easily
believe that such students exist, and I suspect that some of your students might easily accept
that claim as well.) (b) The stemplots suggest that women (claim to) study more than men.
The approximate centers are 175 minutes for women and 120 minutes for men. 62 Chapter 1 Looking at Data—Distributions 1.42. The stemplot gives more information than a histogram (since all the
original numbers can be read off the stemplot), but both give the same impression. The distribution is roughly symmetric with one value (4.88) that
is somewhat low. The center of the distribution is between 5.4 and 5.5 (the
median is 5.46, the mean is 5.448); if asked to give a single estimate for the
“true” density of the earth, something in that range would be the best answer. 48
49
50
51
52
53
54
55
56
57
58 8
7
0
6799
04469
2467
03578
12358
59
5 1.43. (a) There are four variables: GPA, IQ, and self-concept are quantitative, while gender
is categorical. (OBS is not a variable, since it is not really a “characteristic” of a student.)
(b) Below. (c) The distribution is skewed to the left, with center (median) around 7.8. GPAs
are spread from 0.5 to 10.8, with only 15 below 6. (d) There is more variability among the
boys; in fact, there seems to be a subset of boys with GPAs from 0.5 to 4.9. Ignoring that
group, the two distributions have similar shapes.
0
1
2
3
4
5
6
7
8
9
10 5
8
4
4689
0679
1259
0112249
22333556666666788899
0000222223347899
002223344556668
01678 Female 4
7
952
4210
98866533
997320
65300
710 1.44. Stemplot at right, with split stems. The distribution is
fairly symmetric—perhaps slightly left-skewed—with center
around 110 (clearly above 100). IQs range from the low 70s
to the high 130s, with a “gap” in the low 80s. 0
1
2
3
4
5
6
7
8
9
10 Male
5
8
4
689
069
1
129
223566666789
0002222348
2223445668
68 7
7
8
8
9
9
10
10
11
11
12
12
13
13 24
79
69
0133
6778
0022333344
555666777789
0000111122223334444
55688999
003344
677888
02
6 Solutions 63 1.46. The time plot on the right shows that
women’s times decreased quite rapidly from
1972 until the mid-1980s. Since that time,
they have been fairly consistent: Almost all
times since 1986 are between 141 and 147
minutes. Winning time (minutes) 1.45. Stemplot at right, with split stems. The distribution is
skewed to the left, with center around 59.5. Most self-concept
scores are between 35 and 73, with a few below that, and one
high score of 80 (but not really high enough to be an outlier). 2
2
3
3
4
4
5
5
6
6
7
7
8 01
8
0
5679
02344
6799
1111223344444
556668899
00001233344444
55666677777899
0000111223
0 190
180
170
160
150
140
1970 1975 1980 1985 1990 1995 2000 2005
Year 1.47. The total for the 24 countries was 897 days, so with Suriname, it is 897 + 694 = 1591
days, and the mean is x = 1591
25 = 63.64 days.
1.48. The mean score is x = 821
= 82.1.
10 1.49. To ﬁnd the ordered list of times, start with the 24 times in Example 1.23, and add 694 to
the end of the list. The ordered times (with median highlighted) are
4, 11, 14, 23, 23, 23, 23, 24, 27, 29, 31, 33, 40 ,
42, 44, 44, 44, 46, 47, 60, 61, 62, 65, 77, 694
The outlier increases the median from 36.5 to 40 days, but the change is much less than the
outlier’s effect on the mean.
1.50. The median of the service times is 103.5 seconds. (This is the average of the 40th and
41st numbers in the sorted list, but for a set of 80 numbers, we assume that most students
will compute the median using software, which does not require that the data be sorted.)
1.51. In order, the scores are:
55, 73, 75, 80, 80 , 85 , 90, 92, 93, 98
The middle two scores are 80 and 85, so the median is M = 80 + 85
= 82.5.
2 64 Chapter 1 Looking at Data—Distributions 1.52. See the ordered list given in the previous solution.
The ﬁrst quartile is Q 1 = 75, the median of the ﬁrst ﬁve numbers: 55, 73, 75 , 80, 80.
Similarly, Q 3 = 92, the median of the last ﬁve numbers: 85, 90, 92 , 93, 98.
1.53. The maximum and minimum can be found by inspecting the list. The sorted list (with
quartile and median locations highlighted) is
1
19
55
75
104
140
201
372 2
25
56
76
106
141
203
386 2
30
57
76
115
143
211
438 3
35
59
77
116
148
225
465 4
40
64
80
118
148
274
479 9
44
67
88
121
157
277
700 9
48
68
89
126
178
289
700 9
51
73
90
128
179
290
951 11
52
73
102
137
182
325
1148 19
54
75
103
138
199
367
2631 This conﬁrms the ﬁve-number summary (1, 54.5, 103.5, 200, and 2631 seconds)
given in Example 1.26. The sum of the 80 numbers is 15,726 seconds, so the mean is
x = 15,726
80 = 196.575 seconds (the value 197 in the text was rounded)...

View
Full Document

- Summer '11
- Yew-Wei
- Statistics, Standard Deviation, The Hours, Speak