Unformatted Document Excerpt
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
101. A pie chart could be used, since each number in the table represents parts of the total ($35,588 million total sales).
14,000 12,000 Sales ($million) 10,000 8,000 6,000 4,000 2,000 0
Instant games Three-digit Four-digit games games Lotto Other games
102. The numbers in the table sum to 35,589. This diers from the given total because each of these numbers is rounded to the nearest million.
Adult women (thousands)
103. (a) Adding the Never married, Widowed, and Divorced groups gives 43,148 thousand. (b) At right. (c) A pie chart could be used, since each number in the table represents parts of the total (approximately 102,403,000 women).
60,000 50,000 40,000 30,000 20,000 10,000 0
Never married Married Widowed Divorced
104. This graph is misleading because the cones used to represent each interest rate are scaled both horizontally and verticallythey suer from the pictogram defect. 105. The CDs used to represent each companys share are scaled both horizontally and vertically, making this a pictogram, and therefore misleading.
Graphs Good and Bad
106. Either a bar chart or a pie chart would be appropriate; both are shown below. The pie chart labels might also show the actual percents. An Other methods category (with 8.3%) is needed so that the total is 100%.
Percent of all murders 50 40 30 20 10 0
Handgun Other firearm Knife Body part Blunt object Other method Blunt object Body part Knife Other firearm Handgun Other method
107. (a) The price uctuates regularly, rising and falling over the course of each year, as the supply of oranges varies. (b) Overall, prices rise gradually over the decade. 108. (a) The given percents add to 77%, so 23% were in other elds. (b) Either a bar graph or a pie chart would be appropriate; both are shown below.
Percent of all students 20 15 10 5 0
Ar Hu ts/ ma niti Bu es sin Ed uca En S Pro fes sc ocial S gi sio ien tion cien neeri na ce ce ng l / O fiel ther ds
Other fields Arts/ Humanities Business
109. (a) This is a pictogram, and therefore is misleadingly scaled. (b) Bar graph shown at right. A pie chart would not be particularly appropriate for this situation, since we do not know the total value of all exports. (However, we could show make a pie chart with three sections that would show the relative values of the exports for each of these three countries.)
Value of exports ($billion)
600 500 400 300 200 100 0 United States Germany Japan
Solutions 1010. (a) Right. (b) The plot shows a decreasing trendfewer disturbances overall in the later yearsand more importantly, there is an apparent cyclic behavior. Looking at the table, the spring and summer months (April through September) generally have the most disturbancesprobably for the simple reason that more people are outside during those periods.
Number of Disturbances
40 30 20 10 0 1968 1969 1970 1971 Period 1972 1973
1011. Adjust the proportions and the maximum and minimum values on the vertical axis. Two possible graphs are shown below.
35 Percent of all births born to unwed mothers Percent of all births born to unwed mothers 1960 1965 1970 1975 1980 1985 1990 1995 Year 60 50 40 30 20 10 0 30 25 20 15 10 5 0
1960 1965 1970 1975 1980 1985 1990 1995
1012. The time scale on the horizontal axis is erratic. Equal spacings stand in succession for 14, 10, 18, 4 and 4 years. 1013. (a) The percent of entering students who bring a typewriter almost certainly shows a downward trend (if that percentage has not bottomed out). (b) More and more students bring computers to campus, so this should show an upward trend. (c) Students might argue for an upward trend, a downward trend, or no trend for this one, depending on their perceptions of the social climate. If one considers changes since the early 1900s, the percent has increased; in more recent years, it is not so clear. 1014. The temperature will rise and fall with the changing seasons. That is, there will be a 12-month repeating pattern, rising in the summer months and falling in winter. 1015. The Christmas seasonal peak in retail sales may account for the entire increase. 1016. The ocial report is adjusted for the expected seasonal variation due to high school and college students entering the work force for the summer. 1017. The cycle is about 10 years. There might be a trend, in that the peaks appear to be higher in the middle and end of the century.
290 1018. Before 1999, the number of cars was greater than the number of trucks, but while car sales remain fairly steady (other than random yearly uctuation), truck sales have steadily increased until they overtook car sales in 1999. Another noteworthy feature of these two line graphs is that they uctuate togetherfor example, both dipped slightly in 1991.
Graphs Good and Bad
New vehicle sales (1000s)
10,000 8,000 6,000 4,000 2,000 0
1981 1985 1989 1993
1019. This graph is a correct pie chart. Despite the tire design, it is essentially still a circle, broken up into pieces proportional to the respective percentages. 1020. The bar graph shown on the right has bars ordered by decreasing height. While this is fairly typical, it is not necessary. The bar graph makes it easier to see percentages (without writing them in or next to the wedges, as was done with the pie chart). It is also easier to compare sizes of bars than wedge angles.
Percent of all auto sales 30 25 20 15 10 5 0
G Mo enera tor l s Fo Ja rd ma pane nu se fac tur ers Ch rys ler Oth ers
Deportable aliens caught (1000s)
1021. The line graph on the right shows a long-term increasing trend with a fair amount of uctuation. There were sharp drops in 1981 and 1989. Note that we cannot tell from this data whether the increase is due to a greater number of aliens crossing the border, or increased patrolling and enforcement, or both.
1600 1400 1200 1000 800 600 400 200 0 1970
1022. Sketches will vary.
1023. (a) Line graph shown on the right. 150 (b) Without exception, average prices increased over this time period. (c) The 125 fastest rise occurred between 1978 and 100 1980 (up 17.2 points, a 13.2% increase per 75 year). Prices rose most slowly between either 1970 and 1972 (up 3 points, a 3.9% 50 annual increase) or from 1984 to 1986 (up 25 5.7 points, a 3.8% annual increase). In 1970 1975 1980 1985 1990 1995 terms of percent increase, the 199698 Year period is best; that 6.1 point rise over two years is only a 1.9% annual increase. Note: Most students will likely only consider the point increase, and so will choose 197072, although percent increase might be considered a better indicator. 1024. There is certainly some overlap between these groups. We do not know, for example, how many teens used both alcohol and cigarettes. 1025. (a) Since there were 92,353 accidental deaths in total, 20,340 must have been due to causes other than those listed here. The percents due to each cause are in the table on the . right; for example, 45.8% = 42,340 . (b) The distribution 92,353 can be displayed using either a bar graph or pie chart; both are shown below. The vertical scale on the bar graph can be either the percent of all deaths (as shown) or the number of deaths.
Percent of all accidental deaths 40 30
Fires Other causes Drowning Poison Falls Motor vehicle
CPI (Base = 198284)
Cause Motor vehicle Falls Poison Drowning Fires Other causes
Percent 45.8% 12.8 11.0 4.4 3.9 22.0
20 10 0
Motor vehicle Falls Poison Drowning Fires Other causes
Average annual interest rate
1026. (a) Line graph at right. (b) Peaks occurred in 1974, 1981, 1989, and 1995. (Some may also say 1984, although it seems more like a random blip than the peak of a cycle.) (c) The highest peak was in 1981; since then, the overall trend (ignoring the cycles) has been downward.
16 14 12 10 8 6 4 2 0 1972
292 1027. (a) Line graph at right. (b) Womens times decreased quite rapidly from 1972 until the mid-1980s. Since that time, they have been fairly consistent: All times since 1986 are between 142 and 147 minutes.
Graphs Good and Bad
Winning time (minutes)
190 180 170 160 150 140 1972 1976 1980 1984 1988 1992 1996 Year
Chapter 11 Solutions
111. The distribution is roughly symmetric, centered at or about noon, spread from 6:30 am to 5:30 pm, and reveals no outliers. 112. (a) The two smallest percents are 10.9% and 11.0%. (b) Without the two smallest percents, the distribution is roughly symmetric, centered at about 13.9%, spread from 12.1% to 15.9%. (c) The distribution of young adults is less spread out than the distribution of older adults (even if we ignore the outliers in Figure 11-6). 113. The distribution is strongly right-skewed, trailing o rapidly from the peak at 0 through 4. It is spread from 0 to 54, with few universities awarding more than 20 doctorates to minorities. 114. (a) The distribution is roughly symmetric, although it might be viewed as slightly skewed to the right. (b) The center is about 15%. (39% of the stocks had a total return less than 10%, while 60% had a return less than 20%. This places the center of the distribution somewhere between 10% and 20%.) The smallest return was between 70% and 60%, while the largest was between 100% and 110%. (c) About 23% of stocks lost money (those bars have total height 1+1+1+1+3+5+11). 115. A stemplot would have more information (too many digits) than could easily be absorbed. 116. Because there is a relatively small range of numbers (from 21 to 32), it is useful to treat each number as if it were (e.g.) 24.0, and use the tens and ones digits as the stem, and the tenths digit (0) as the leaf. The two cars that get 21 mpgthe BMW and the Jaguar S/Ccould be considered gas guzzlers.
21 22 23 24 25 26 27 28 29 30 31 32 12 13 14 15 16 17 18 19 20 21 22 00 0 000000 0000 0 0 0000000000 000 000 0 7 8 047777 233479 068 0456899 25777 003558899 2777 3 09
117. The distribution can be displayed using either a histogram or a stemplot. The distribution is roughly symmetric, perhaps slightly left-skewed, spread from 12.7% to 22.9%, centered around 17% or 18%.
Displaying Distributions With Graphs
118. Histogram on the right. The intervals in this histogram are $0$999,999, $1,000,000$1,999,999, etc.; students might choose slightly dierent intervals. The salary distribution is strongly skewed to the right.
10 8 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 Salary (units of $100,000)
119. (a) The histogram appears on the right. The distribution is strongly skewed to the right (many short words, a few quite long words). (b) Shakespeare uses more short words (especially 3 and 4 letters) and fewer very long words than does Popular Science. I would guess that words like carburetor and semiconductor account for the distinction.
Percent of words
15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Word length
1110. Sketches will vary. The distribution of coin years would be left-skewed because newer coins are more common than older coins. 1111. It will likely be roughly symmetric, but perhaps slightly 15 2 457 right-skewed (because some team owners have more money to 3 123566 spend). 4 Note: The stemplot of 2000 payrolls is shown on the right for 5 114445689 6 012 reference. It is, in fact, slightly right-skewed, with the New York 7 28 Yankees topping the list with a payroll of $113 million (possibly an 8 09 outlier), followed by Atlanta, Los Angeles, and Boston. Minnesota 9 345 ($15.8 million) had the lowest payroll. The skewness pulls the mean 10 11 3 up slightly: x = $56.7 million, while the median is M = $54.9 million. The fact that the distribution of payrolls is considerably less skewed than the distribution of salaries for any individual team is a consequence of the central limit theorem, discussed in Chapter 25 of the text.
0 167888 0 0000001111 1112. The distribution is strongly right-skewed, 1 1245 0 222222333 with two states (New York and Florida) that could 2 035568 05 be considered outliers. (Some students might 3 355 0 67 include New Jersey as an outlier, and perhaps even 4 09 59 11 Illinois.) 62 1 Note: Shown are two possible approaches to 1 44 79 making this stemplot. The rst is the one that most 8 students would likely construct: The stems are the 99 10 numbers 014, and the leaves are the digits after the 11 9 decimal. The second is constructed with split stems, 12 an idea not discussed in the text: The tens digit 13 14 24 (either 0 or 1) is used for the stem, and the ones digit is the leaf (the digit after the decimal is ignored). Each stem is written 5 times; the rst 0 stem is for the leaves 0 and 1, the second 0 stem is for the leaves 2 and 3, etc. This makes for a more compact stemplot.
1113. The distribution is irregular in shape. The hot dog brands fall into two groups with a gap between them, plus a low outlier. The low outlier is the calorie count for the veal hot dog brand. Note: This distribution reminds us that not all distributions are usefully described as either symmetric or skewed. Students may be tempted to leave out stems with no leaves, but this would give a misleading picture.
10 11 12 13 14 15 16 17 18 19 1950 19.4% 14.4 15.9 15.1 12.8 10.3 7.3 3.6 1.1 0.1 0.0
7 5689 067 3 2359 2 015 2075 11.2% 11.5 11.8 12.3 12.2 12.1 11.1 8.8 6.1 2.5 0.5
1114. (a) Table at right. For example, 29.3/151.1 = 19.4%, 34.9/310.6 = 11.2%, etc. (b) Histogram below, left. Children (under 10) represent the single largest group in the population; about one out of ve Americans was under 10 in 1950. There is a slight dip in the 1019 age bracket, then the percentages trail o gradually after that. (c) Histogram below, right. The projections show a much greater proportion in the higher age brackets; there is now a gradual rise in the proportions up to ages 4049, followed by the expected decline in the proportion of senior citizens.
Percentage of population 20 15 10 5 0
Age group 09 1019 2029 3039 4049 5059 6069 7079 8089 9099 100109
Displaying Distributions With Graphs
1115. Stemplot below (in the solution to Exercise 1116). The distribution is roughly symmetric (it appears slightly left-skewed, if stems are split), and centered at 46 (a typical year). Ruths best year was not at all unusual for him; 60 is not an outlier. 1116. McGwires nine home runs in each of 1993 and 1994 stand out as extraordinarily low; these were the two shortened seasons. Other than that, he has been a fairly productive and consistent home-run hitter, with a strong suggestion that he has recently gotten better (note that this cannot be determined from the stemplot). Ignoring his low outliers, one could claim that he and Ruth seem to be fairly evenly matched. 1117. Use either a histogram or stemplot. Shown is a stemplot with split stems (see comments in the solution to Exercise 11-12): The stem 1 is listed twice, once for the leaves 04, and once for the leaves 59. This has the same appearance as a histogram with intervals 59.99, 1014.99, 1519.99, etc. The distribution is right-skewed, peaking between 10 and 12.
0 1 1 2 2 3 3 4 McGwire 99 2 9932 92 82 5 0 Ruth 0 1 2 3 4 5 6 7 25 45 1166679 449 0
667788889 0000011111111222334444 555667889 122 67 24 88 3
Chapter 12 Solutions
121. Half of all households make more than the median, and the other half make less. 122. Income distributions are skewed to the right, so mean incomes are higher than medians. Household incomes include incomes for persons living alone, which are excluded from family incomes; such incomes are likely to be lower than incomes for groups of two or more. 123. (a) The mean is greater than the median, since income distributions tend to be right-skewed. (b) Some respondents may have exaggerated their incomes (to impress either the pollster, or someone in the room with them as they answered the question). 124. (a) The ve-number summary is Min = $1300, Q1 = $1800, M = $4500, Q3 = $10, 600, Max = $19, 300. (b) This distribution is clearly right-skewed, so the mean would be higher than the median. (In fact, the mean is about $6400.) 125. (a) The distribution is roughly symmetricperhaps slightly left-skewed, but not . enough to have much eect on the mean. (b) The mean is x = 13.818% and the median is M = 13.9%. 126. (a) Because there is a relatively small range of numbers (from 21 to 32), it is useful to treat each number as if it were (e.g.) 24.0, and use the tens and ones digits as the stem, and the tenths digit (0) as the leaf. (b) The ve-number summary, all in units of mpg, is Min = 21, Q1 = 24, M = 28, Q3 = 28, Max = 32. The bottom quarter includes the BMW, Inniti Q45, and Jaguar S/C (all of which get less than 24 mpg), and those that get 24 mpg: the Acura, Audi, Cadillac Catera, Jaguar Vanden Plas, Lexus GS300, and Mercedes-Benz E430. (Some students might not include those that get 24 mpg.)
21 22 23 24 25 26 27 28 29 30 31 32 00 0 000000 0000 0 0 0000000000 000 000 0
127. The distribution is right-skewed (see the histogram in the solution to Exercise 118), so the mean ($3.04 million) is greater than the median ($2.57 million). 128. Since income distributions are right-skewed, the mean is $675,000 and the median is $330,000. 129. (See also the solution to Exercise 1112.) The distribution is irregular in shape. The hot dog brands fall into two groups with a gap between them, plus a low outlier. The ve-number summary, in units of calories, is Min = 107, Q1 = 139, M = 153, Q3 = 179, Max = 195. Any numerical summary would not reveal the gaps in the distribution. Note: Students may be tempted to leave out stems with no leaves, but this would give a misleading picture.
10 11 12 13 14 15 16 17 18 19 7 5689 067 3 2359 2 015
Describing Distributions with Numbers
1210. Since the distribution is roughly symmetric with no apparent outliers, the mean and standard deviation would be suitable for this data. 1211. (a) The minimum is (of course) in position number 1, and the maximum is in position 115. The median is in position 115+1 = 58. The rst quartile is in position 2 57+1 2 = 29 (the median of the lowest 57 observations), and the third quartile is in position 58 + 29 = 87 (the median of observations 59 through 115). (b) The values in the venumber summary fall in the ranges: 0 to 4, 0 to 4, 5 to 9, 10 to 14, 50 to 54. The top quarter includes universities which produce about 10 or more minority engineer Ph.D.s per year. 1212. The minimum and maximum are easily determined, and the quartiles and median can be found by adding up the percentages. For example, 3.6% + 14.8% is less than 25%, while 3.6% + 4.8% + 18.7% is more than 25%, so Q1 must equal 3. Continuing this way, we nd that the ve-number summary, in units of letters, is Min = 1, Q1 = 3, M = 4, Q3 = 7, Max = 15. 1213. Either a stemplot or a histogram could be used to display the distribution. The distribution is strongly rightskewed with at least two high outliers (New York and Florida, and possibly also New Jersey and Illinois), so the ve-number summary is appropriate. In units of thousands of immigrants, this is Min = 0.6, Q1 = 1.6, M = 5.15, Q3 = 17.3, Max = 123.7.
0 1 2 3 4 5 6 7 8 9 10 11 12 0011111122334589 244799 8 1
1214. With New York included, x = 16.63 and M = 5.15 thousand immigrants. After removing New York, x = 12.35 and M = 4.40 thousand immigrants. The mean changes more than the median. 1215. The histogram reveals two peaks (one around 500 and the other around 550), so choosing a single center is not appropriate. 1216. This must be the mean, since (by denition) 205 players would make more than (or at least as much as) the median salary. (It could be the median, but only in the extremely unlikely event that at least 67 players made exactly $2.36 million.) 1217. (a) The mean is more useful in this situation, because of the need for a total. (b) The median is appropriate here. The word typical often refers to median. 1218. The mean is more useful in this situation. If we know that x = 5 cans, then 150 (30 5) cans will be needed. We could make no such judgment if we knew M = 3 cans.
Solutions 1219. The SATV plot has more spread in the box, but the SATM plot has more spread overall.
600 575 550 525 500 475 450 SATV SATM
1220. (a) A stemplot is shown; a histogram could also be used. The ve-number summary, in units of mpg, is Min = 16, Q1 = 18, M = 19, Q3 = 20, Max = 27. The distribution has two high outliers: The Suburu Forester and the Toyota RAV4. (b) The boxplots clearly show that SUVs generally have lower fuel eciencies (higher consumption) than midsize cars. (The midsize box has no line in the middle since we found M = Q3 = 28 mpg.)
16 17 18 19 20 21 22 23 24 25 26 27
Fuel efficiency (mpg)
0 000 00000 000000 0000000 0 0
1221. The boxplots show that poultry hot dogs as a group are lower in calories than meat and beef hot dogs, which have similar distributions. Note: Comparison with the solutions to Exercises 1112 and 129 reminds us that boxplots and numerical summaries dont catch the detail of a distribution.
180 160 140 120 100 80 Beef Meat Poultry
1222. (a) x = 32.4 = 5.4 mg of phosphate per deciliter 6 of blood. (b) The details of the computation are shown on the right. The standard deviation is s = 2.06 . 5 = 0.6419 mg of phosphate per deciliter of blood.
xi 5.6 5.2 4.6 4.9 5.7 6.4 32.4
xi x 0.2 0.2 0.8 0.5 0.3 1 0
(xi x)2 0.04 0.04 0.64 0.25 0.09 1 2.06
Describing Distributions with Numbers
(a) 0 0 0 00 0 (b) 0 1 2 3 4 5 6 0 0 00 0 0
. . 1223. Both have mean 3; set (a) has s = 2.19 while set (b) has s = 1.41. Stemplots show that set (a) is more spread out.
. 1224. (a) The new mean is x = 30 = 5 (increased by 2), while s = 2.19 as before. 6 (b) Adding 10 to each observation would add 10 to the mean, but would not aect the standard deviation. . . . 1225. For midsize cars: x = 26.6 and s = 2.71 mpg. For SUVs: x = 19.5 and s = 2.47 mpg. As in Exercise 1220, we see that midsize cars are more fuel ecient. The two distributions have similar standard deviations, but note that s might not be appropriate for the SUV data, since it has outliers. 1226. (a) 0, 0, 9, 9 (greatest spread) is the only answer. (b) 1, 1, 1, 1 (no spread) is one answer. (c) Any collection of equal numbers has variance 0, so (b) has 10 correct answers. The answer to (a) is unique. 1227. Both sets of data have the same mean and standard deviation (x = 7.50 and s = 2.03). However, the two distributions are quite dierent: Set A is left-skewed, while B has a high outlier.
3 4 5 6 7 8 9 10 11 12 1 7 1 2 1177 112 3 4 5 6 7 8 9 10 11 12
257 58 079 48
1228. (a) The mean will increase by $1000, as will the median. (b) Both quartiles also increase by $1000, so the distance between them is unchanged. (c) There is no change in the spread of salaries, so the standard deviation is unchanged. 1229. Since the higher salaries will increase by more than the lower, the spread is increased when all receive a 5% raise; both measures of spread will rise. (In fact, the quartiles and standard deviation all increase by a factor of 1.05, but this may be beyond many [most?] students.) 1230. The relatively small number top students pull the mean above the median (right skewness again). The typical score (median) is not aected by these students or by their absence.
1231. Here are some issues to consider in choosing examples for each kind of graph: Boxplots are useful for distributions that are unimodal (that is, they have only one peak). Stemplots are nice for small data sets. Histograms are good for small or large data sets, including cases where only relative frequency is known, and can be used in cases where we nd it useful to cluster the data in intervals that do not easily translate to stems and leaves. For example, we might wish to group together 03, 47, 811, 1215, etc. Note: Although this text does not consider such situations, it is possible to construct histograms using unequal interval widths, for which stemplots would not be appropriate. For example, we might consider income ranges $0$14,999, $15,000$29,999, $30,000 $49,999, $50,000$74,999, etc. In such cases, the rectangles are drawn so that their area (not just their height) corresponds to the proportion within that interval.
Chapter 13 Solutions
131. Many answers are possible. 132. (a) Symmetric with two peaks (bimodal). Point B marks the location of both the mean and the median. (b) This distribution is skewed to the right, so the mean (B) is greater than the median (A). (c) Symmetric and mound-shaped (but not a normal curve). Point A is both the mean and median. (d) This distribution is skewed to the left, so the mean (A) is less than the median (B). 133. (a) The curve forms a 1 1 square, which has area 1. (b) The mean and median are both 0.5. (c) 40% (the region is a rectangle with height 1 and base width 0.4; hence the area is 0.4). 134. The middle 95% of IQs is between 89 and 133that is, 111 2(11). 135. 84% of IQ scores are above 100, since an IQ of 100 is 1 standard deviation below the mean: Half (50%) of all these students have IQs above 111, and half of the middle 68% (34%) have IQs between 100 and 111. 136. 0.15%, because 144 is 3 standard deviations above the mean, so only half of the outer 0.3% have IQs above 144. It is not surprising that none of these 74 had such a high IQ, since 0.15% of 74 is only 0.111 students. 137. (a) Within 2 standard deviations of the mean: 266 2(16), or 234 to 298 days. (b) Shorter than 234 days (more than 2 standard deviations below the mean). 138. The mean is certainly 10, and the standard deviation is about 2. (Expect some variation in students judgment of the latter.) 139. (a) Within 3 standard deviations of the mean: 336 3(3), or 327 to 325 days. (b) About 16%, since 339 days or more corresponds to at least 1 standard deviation above the mean. 1310. The three stand close together, an astounding four standard deviations above the typical hitter. (Williams has a slight edge, but perhaps not large enough to declare him the best.) 1311. (a) Sarahs standard score is
12090 25 135110 25
Cobb Williams Brett
.420.266 .0371 .406.267 .0326 .390.261 .0317
= 4.15 = 4.26 = 4.07
= 1, while her mothers standard score is
= 1.2. (b) Sarahs mother scored higher relative to her age group, since she has a higher standard score. But Sarah had the higher raw score, so she does stand higher in the variable measured.
Solutions 1312. Students may at rst make mistakes like drawing a half-circle instead of the correct bellshaped curve, or being careless about locating the standard deviation.
62.5 65 67.5 70 72.5 75
1313. (a) 2.5%: Taller than 75 inches means more than two standard deviations above the mean. (b) 65 to 75 inches; that is, 70 2(2.5). (c) 16% (half of the outer 32%), since shorter than 67.5 inches means at least 1 standard deviation below the mean. 1314. 2.5%, since 70 inches corresponds to a standard score of 2 on the womens height scale. 1315. No: It has two peaks because of the two distinct subgroups (men and women). 1316. (a) About 50% of p values fall above the mean (0.4). Since 0.43 is 2 standard deviations above the mean, about 2.5% of all p values are above 0.43. (b) 0.37 to 0.43 i.e., 0.4 2(0.015). Note: It is probably best to use decimals for these proportions rather than percentages (0.37 instead of 37%) to lessen the confusion with, e.g., 95%. 1317. Since 0.5 (or 50%) is 2 standard deviations above the mean (0.45), about 2.5% of samples will show a proportion of 0.5 or more. 1318. About 18.4%: A score of 820 corresponds to a standard score of the standard score 0.9 is the 18.41 percentile. 1319. About 8%: A score of 720 corresponds to a standard score of the standard score 1.4 is the 8.08 percentile.
. = 0.9426; . = 1.4211;
. 1320. About 1%: A SATM score of 800 corresponds to a standard score of 800531 = 115 2.3391 on the mens scale; the standard score 2.3 is the 98.93 percentile, so 1.07% of men score above that point. . 1321. About 38%: A SATM score of 800 corresponds to a standard score of 531495 = 109 0.3303 on the womens scale; the standard score 0.3 is the 61.79 percentile, so 38.21% of women score above 531. 1322. (a) An IQ score of 130 is 2 standard deviations above the mean, so about 2.5% of 1932 children had very superior scores (2.28% if using Table B). (b) For a present-day . child, a score of 130 corresponds to a standard score of 130120 = 0.6667. The standard 15 score 0.7 is the 75.80 percentile, so about 24.2% of present-day children would have very superior scores. 1323. About the 76th percentile: An IQ score of 111 corresponds to a standard score of 111100 . = 0.7333, and the standard score 0.7 is the 75.80 percentile. 15
1324. (a) 12%2(16.5%) = 21% to 45%. (b) About 0.24: 0% corresponds to a standard . score of 012 = 0.7272, and the standard score 0.7 is the 24.20 percentile. (c) About 16.5 . 0.21: 25% corresponds to a standard score of 2512 = 0.7879, and the standard score 0.8 16.5 is the 78.81 percentile, so about 21.2% of years have a gain of 25% or more. 1325. As accurately as can be determined from Table B, the quartiles are about 0.7. 1326. The tallest 10% of women are about 68.25 inches or taller: The standard score closest to the 90th percentile is 1.3, which corresponds to a height of 65+1.3(2.5) = 68.25 inches. 1327. About 127 or more: The 75th percentile is (close to) 0.7, which corresponds to an IQ of about 110 + 0.7(25) = 127.5.
Chapter 14 Solutions
141. (a) A correlation is between 1 and 1. (b) s can be any positive number. (Or 0, but only if the data show no variation whatsoever.) 142. (a) Longer cockroaches tend to be heavier, and short ones tend to be lighter. That is, big values of one variable go with big values of the other, and small values go with small. (b) Correlation is unchanged by the units used, so converting to inches does not aect r. 143. (a) Student A has IQ near 103 and GPA near 0.5. (b) Students A and B have the two lowest GPAs but have moderate (near-average) IQs. Student C has the lowest IQ but a moderate GPA. Note: The data from which this plot was produced had Student A with IQ 103 and GPA 0.530, Student B with IQ 109 and GPA 1.760, and Student C with IQ 72 and GPA 7.295. 144. The association is roughly linear and positive (high calories tend to go with high sodium, and low tends to go with low). Point A is a hot dog brand which is well below average in both calories and sodium. 145. The plot shows a moderate positive association, so r should be positive, but not too close to 1. Note: In fact, r = 0.631. Students may be unsure of the eect of points A, B, and C; these points decrease the value of r, but are not sucient to cancel out the overall positive association of the other points. (Without these three points, the correlation increases to 0.740.) See also Exercise 148. 146. This shows a fairly strong positive association, so we expect r to be reasonably close to 1. Note: In fact, r = 0.863. In this case, the point A makes the correlation higher, because its presence makes the scatterplot appear more linear. (With point A removed, the correlation drops slightly to 0.834.) See also Exercise 148. 147. Figure 14-9 clearly shows a stronger association, so its correlation is closer to 1. 148. The correlation increases when A, B, and C are removed from Figure 14-8, because their presence makes the plot look less linear. The correlation decreases when A is removed from Figure 14-9, because that plot looks more linear with A. (That is, if we drew a line through that scatterplot, there is less relative scatter about that line with point A than without.)
Describing Relationships: Scatterplots and Correlation
Pulse rate (beats / minute)
149. (a) Scatterplot on the right. Time is explanatory, so it should be on the horizontal axis. (b) The association is negative. This is reasonable since a lower time requires greater exertion, which increases the pulse rate. (c) The relationship is linear and moderately strong.
155 150 145 140 135 130 125 120 33.5
34.5 35 35.5 Time (minutes)
1410. (a) Body mass is the explanatory variable, so it should be on the horizontal axis. Women are marked with solid circles, men with open circles. (b) The womens points show a moderately strong, linear, positive association. (c) The mens points also show a positive linear association, but it is much weaker. As a group, males typically have larger values for both variables.
Metabolic rate (cal/day)
1800 1600 1400 1200 1000 850 30
o o o
40 45 50 55 Lean body mass (kg)
1411. (Scatterplot not shown.) If the husbands age is y and the wifes x, the linear relationship y = x + 2 would hold, and hence r = 1.
Humerus length (cm or mm)
1412. (a) At right. The open circles are the original data points, and the solid circles are the new ones. (b) Although changing the scales makes the scatterplot look very dierent, it has no eect on the correlation.
800 700 600 500 400 300 200 100 0
20 30 40 50 60 Femur length (cm or m)
1413. (a) The correlation is r = 0.746, which seems reasonable since the scatterplot shows a moderately strong negative association. (b) It would not change, since units do not aect r. 1414. (a) See the solution to Exercise 14-10 for the plot. It appears that the correlation for men will slightly be smaller, since the mens points are more scattered. (b) Women: . . r = 0.876. Men: r = 0.592.
Solutions 1415. The scatterplot (right) shows the association clearly. The correlation is 0 because these variables do not have a straight-line relationship; the association is neither positive nor negative.
30 Mileage (mpg)
20 15 20 30 40 50 Speed (mph) 60
1416. The mean would be multiplied by a factor of 2.2 (since every individual observation would be multiplied by 2.2), but the correlation would not change. 1417. (a) Mean age is measured in years. (b) The reaction time standard deviation is measured in seconds. (c) Correlation has no units. (d) Median age is measured in years. 1418. The newspaper interpreted zero correlation as implying a negative association between teaching ability and research productivity. The speaker meant that teacher rating and research productivity are not related (strictly speaking, that they are not linearly related.) 1419. (a) Since gender has a nominal scale, we cannot compute the correlation between sex and anything. [There is a strong association between gender and income. Some writers use correlation as a synonym for association. It is much better to retain the more specic meaning.] (b) A correlation r = 1.09 is impossible, since 1 r 1 always. (c) Correlation has no units, so r = 0.53 years is incorrect. 1420. We expect a mans height to be most strongly associated with his own height as a boy (and a similar relationship between a womans childhood and adult heights), next most strongly associated (by inheritance) with his sons height, and more weakly associated with his wifes height. So: (a) 0.5. (b) 0.2. (c) 0.8. 1421. (a) Negative (for most modelsbut not for collectible cars). (b) Negative (i.e., more weight means fewer mpg). (c) Positive. (d) Small. 1422. (a) Small-cap stocks have a lower correlation with municipal bonds, so the relationship is weaker. (b) She should look for a negative correlation (although this would also mean that this investment tends to decrease when bond prices rise).
Describing Relationships: Scatterplots and Correlation
Hot dog price (dollars)
1423. Since the instructions specify that soda price should be viewed as explanatory, it should be on the horizontal axis. The relationship, if any, is weakly positive. The points for the Mets and the Cardinals might be considered outliers. Note: The scatterplot shown here is drawn with the same scale on both axes, since both variables are measured in the same units (dollars). This is the best practice to follow in such cases.
3.50 3.25 3.00 2.75 2.50 2.25 2.00 1.75
1.50 1.75 2.00 2.25 2.50 2.75 Soda price (dollars)
1424. (a) The exact values for Alaska are 15.2 inches and 332.29 inches. Students estimates will vary, but should be close. (b) Without Alaska and Hawaii, the relationship is weakly positive and roughly linear. Knowing maximum 24-hour precipitation would be somewhat useful in predicting maximum annual precipitation for these other states, but because the relationship is not very strong, Note: Two other states (Oregon and Washington) also could be considered outliers; these are the two other points with maximum annual rainfall over 180 inches. If we removed them, the relationship would be somewhat stronger, but still rather weak for use in prediction. In fact, the correlation for all 50 points is 0.408. With Alaska and Hawaii removed, the correlation drops to 0.292. If Oregon and Washington are removed as well, the correlation rises to 0.462. 1425. (a) Planting rate is explanatory. (b) At right. (c) The pattern is curved: high in the middle and lower on the ends. The association is not linear and is neither positive nor negative. Yield falls o if the plants are too crowded.
170 160 150 140 130 120 110 100 12 16 20 24 Plants per acre (thousands) 28
Yield (bushels per acre)
Solutions . 1426. The correlation is r = 0.481. The correlation is greatly lowered by the one outlier. Outliers tend to have fairly strong eects on correlation; it is much stronger here because there are so few observations.
y 11 10 9 8 7 6 5 4 3 2 1 0
1 2 3 4 5 6 7 8 9 10 x
Describing Relationships: Regression, Prediction, and Causation
Chapter 15 Solutions
151. Inactive girls are more likely to be obese, so if hours of activity is small we expect BMI to be high, and vice versa. The percent of variation in BMI explained by the . straight-line relationship with hours of activity is r2 = 3.2%. 152. (a) Since b < 0, we know there is a negative association. Furthermore, if the percent taking the exam increases (or decreases) by 1 percentage point, we expect the mean SAT math score to decrease (or increase) by about 1.1 points. (b) We predict . 574.6 (1.102 76) = 490.8. 153. Of the observed variation among the GPAs of these 78 students, the percent ex. plained by the straight-line relationship between GPA and IQ score is r2 = 40.2%. The rest of the variation (59.8%) is due to dierences in GPA among students with similar IQ scores. 154. (a) The negative correlation says that when one variable is above average, the other tends to be below average (and vice versa). For example, in states where many students . take the SAT, the mean score is lower. (b) About r2 = 77.2% of the variation in average SAT scores is explained by the straight-line relationship with the percent taking the SAT. This means that knowing the percent taking the SAT gives a good basis for prediction. 155. b = 0.101 means that we expect GPA to increase by about 0.101 points for every one-point increase in IQ (and GPA drops by 0.101 for every one-point decrease in IQ). . For an IQ of 115, we predict a GPA of 3.56 + (0.101 115) = 8.055. 156. For a time of 34.30 minutes, we predict Professor Moores pulse rate to be about . 479.9 (9.695 34.30) = 147.4 beats per minuteabout 4.6 bpm lower than the actual value. 157. (a) At right. Alcohol from wine should be on the horizontal axis. (The line shown on this scatterplot is for the solution to Exercise 15-9.) (b) The association is negative (direction), linear or slightly curved (form), and fairly strong (strength). (c) That r is negative agrees with our observation of the direction of the association. That r is close to 1 is an indication that the association is strong and (close to) linear.
300 250 200 150 100 50 0
Heart disease death rate
1 2 3 4 5 6 7 8 9 Alcohol consumption from wine (liters)
158. (a) Stumps, the explanatory vari60 able, should be on the horizontal axis. The 50 plot shows a positive linear association. 40 (b) x = 1 and x = 5 give, respectively, . . 30 y = 10.6 and y = 58.2. The least-squares regression line is included on the scatter20 plot. (c) The straight-line relationship 10 . 2 = 83.9% of the variation in explains r 0 beetle larvae counts. (d) Yes: There is 1 2 3 4 5 a strong relationship that explains alStumps most 84% of the variation in larvae cluster countsand it is certainly easier to count beaver stumps than larvae clusters. Note: The rst printing of the text incorrectly gave the regression line equation as Larvae clusters = 11.89 (1.286 stumps)that is, slope and intercept were switched. Students might recognize that this wrongly suggests a negative association, or at least notice that this line does not match the scatterplot at all. . 159. For 1 liter of alcohol, we predict 260.6 22.97(1) = 237.63 heart disease deaths per . 100,000 people, and for 8 liters of alcohol, we predict 260.6 22.97(1) = 76.84 deaths per 100,000. This line is included in the scatterplot in the solution to Exercise 15-7. 1510. (a) Shown at right. (b) A regression line is worthless for predicting MPG from 30 speed, because these variables do not have a straight-line relationship. For any speed, the regression line simply predicts 26.8 25 mpg (the average of the ve fuel eciency numbers in our data). Note: Students might be tempted to 20 say that there is no relationship. In fact, 15 20 there is a very strong relationship, but it is neither positive nor negative; it is curved, not linear. A parabola ts these points fairly well.
Mileage (mpg) Beetle larvae
40 50 Speed (mph)
1511. For example, diets and genetic (ethnic) background vary between countries. 1512. If the regression line has slope 0, the predicted value of y is the same (the intercept) regardless of the value of x. In other words, knowing x does not change the predicted value of y.
Describing Relationships: Regression, Prediction, and Causation
1513. (a) At right. The association is negative; as time goes by, pH decreases (and acidity increases). (b) The initial pH was about 5.425; the nal pH was 4.635. (c) The slope is 0.0053; the pH decreased by 0.0053 units per week (on the average).
5 4 pH 3 2 1 0 0 20 40 60 80 100 120 140 Weeks
1514. (a) Plot below, left. The range of values on the horizontal axis may vary. Following the instructions given in the text, this plot can be drawn by taking x = 0 years and y = $500, and x = 10 year and y = $1500. (b) When x = 20, y = 2500 dollars. (c) y = 500 + 200x. (The slope is his rate of savings, in dollars per year).
$3500 $3000 $2500 $2000 $1500 $1000 $500 $0 0 5 10 15 Years 20 25 30
500 Weight (grams) 400 300 200 100 0 0 1 2 3 4 56 Weeks 7 8 9 10
1515. (a) Weight y = 100 + 40x g; the slope is 40 g/week. (b) Above, right. (c) When x = 104, y = 4260 grams, or about 9.4 poundsa rather frightening prospect. The regression line is only reliable for young rats; like humans, rats do not grow at a constant rate throughout their entire life. 1516. No: The correlation and slope always have the same sign, because they both relate to whether the association between the two variables is positive or negative. When we say that two variables have a positive association (and therefore r > 0), for example, we mean that a positively sloped line is the best t for that scatterplot. Note: See also the formula for the slope given in Exercise 15-30. 1517. (a) Scatterplots below. We predict y = 5.5 and y = 8 (respectively) when x = 5 and x = 10. (b) For Set A, the use of the regression line seems to be reasonablethe data do seem to have a moderate linear association (albeit with a fair amount of scatter). For Set B, there is an obvious nonlinear relationship; we should t a parabola or other curve. For Set C, the point (13, 12.74) deviates from the (highly linear) pattern of the other points; if we can exclude it, regression would be would very useful for prediction. For Set D, the data point with x = 19 is a very inuential pointthe other points alone
give no indication of slope for the line. Seeing how widely scattered the y-coordinates of the other points are, we cannot place too much faith in the y-coordinate of the inuential point; thus we cannot depend on the slope of the line, and so we cannot depend on the estimate when x = 10.
11 10 9 8 7 6 5 4 Set A
8 10 12 14
10 9 8 7 6 5 4 3 2
13 12 11 10 9 8 7 6 5 4
13 12 11 10 9 8 7 6 5 4 5
10 15 20
8 10 12 14
8 10 12 14
1518. r = 0.16 = 0.40 (high attendance goes with high grades, so the correlation must be positive). 1519. (a) The scatterplot shows a strong 30 negative association with a straight25 line pattern. The actual regression line y = 1166.93 0.58679x is drawn on this 20 scatterplot; student approximations may 15 vary somewhat from this. (b) For most 10 approximate regression lines, extending 5 the line to 1990 will result in predicting a 0 negative population in 1990. (Using the re1930 1940 1950 1960 1970 1980 gression equation given in (a), substituting Year . x = 1990 gives y = 0.781.) This is not reasonable, since a population must be greater than or equal to 0. The rate of decrease in the farm population dropped in the 1980s; we cannot rely on a prediction outside the boundaries of our data. . 1520. When x = 150, we predict y = 3184.9 deaths per 100,000. This is clearly nonsense; people will not suddenly begin to rise from the dead if alcohol consumption goes up (although someone drinking that much alcohol might see people rising from the dead). The data on which we based this regression line had x between 0 and 10, so we cannot rely on this regression line for predictions outside of that range. 1521. Number of reghters and amount of damage both increase with the seriousness of the re (that is, they are common responses to the res seriousness.) 1522. A reasonable explanation is that the cause-and-eect relationship goes in the other direction: Doing well makes students feel good about themselves, rather than vice versa. 1523. Patients suering from more serious illnesses are more likely to go to larger hospitals (which may have more or better facilities) for treatment. They are also likely to require more time to recuperate afterwards.
Farm population (millions)
Describing Relationships: Regression, Prediction, and Causation
1524. More income in a nation means more money to spend for everything, including health care, hospitals, medicine, etc.which presumably should lead to better health for the people of that nation. On the other hand, if the people in a nation are reasonably healthy, more resources are available that would otherwise be used for healthcare. Those resources can be used to generate more wealth. 1525. A students intelligence may be a lurking variable: Stronger students (who are more likely to succeed once they get to college) are more likely to choose to take these math courses, while weaker students may avoid them. Other possible answers might be variations on this idea; for example, if we believe that success in college depends on a students self-condence, we might suppose that condent students are more likely to choose math courses. 1526. In this case, there may be a causative eect, but in the direction opposite to the one suggested: People who are overweight are more likely to be on diets, and so choose articial sweeteners over sugar. (Also, heavier people are at a higher risk to develop diabetes; if they do, they are likely to switch to articial sweeteners.) 1527. Spending more time watching TV means that less time is spent on other activities; this may suggest lurking variables. For example, perhaps the parents of heavy TV watchers do not spend as much time at home as other parents. Also, heavy TV watchers would typically not get as much exercise. 1528. r = 0.843 indicates a stronger relationship than r = 0.634. The strength of the relationship is indicated by the distance from 0, not the sign. 1529. Students with music experience may have other advantages (wealthier parents, better school systems, etc.). That is, experience with music may have been a symptom (common response) of some other factor that also tends to cause high grades. Some of the success of those who have music experience may be due to the discipline learned from practicing an instrumenta benet that might come from activities other than music. . . . . . 1530. With x = 58.2 cm, sx = 13.20 cm, y = 66.0 cm, sy = 15.89 cm, and r = 0.994, we nd 15.89 cm . slope b = (0.994) 13.20 cm = 1.197 . intercept a = 66.0 cm 1.197(58.2 cm) = 3.66 cm (Answers may vary slightly due to rounding.)
1531. (a) Software (or a calculator) gives pulse rate = 479.9341(9.694903time), which agrees (after rounding) with the equation given in Exercise 15-6. (b) The regression equation for predicting time from pulse rate is time y = 43.10 0.0574(pulse rate x), so the predicted time for x = 152 is about 34.38 minutes. (c) The results of a leastsquares regression depend on which variable is viewed as explanatory since the line is chosen based on vertical distances from each data point to the line. Note: When we say these lines are dierent, we are not just referring to the fact that they have dierent slopes and intercepts. Rather, we mean that solving the equation in (b) for pulse rate as a function of time, which yields pulse rate = 750.8 (17.42 time), gives a formula quite dierent from the least-squares regression line for predicting pulse rate from time. 1532. Software (or a calculator) gives y = 260.5634 22.96877x, which agrees (after rounding) with the equation given in Exercise 15-9. 1533. Software (or a calculator) gives the correlations, slopes, and intercepts as Set Set Set Set A B C D r 0.81642054 0.81623649 0.81628671 0.81652146 b 0.5000909 0.5000000 0.4997273 0.4999091 a 3.000091 3.000909 3.002455 3.001727
The Consumer Price Index and Government Statistics
Chapter 16 Solutions
. 161. For 1985, 1.208 100 = 89.2. For 1990, . 1.354 1.101 1.354 100 = 81.3.
100 = 100 (of course). For 1995,
162. (a) Tuition rose by a factor of 3.26 from the base period to January 2000. Equivalently, tuition costs increased by 226%. (b) The tuition CPI is almost twice as large as the overall CPI. . 163. (a) The index number dropped from 89.2 to 81.3a 7.9 point decrease, or 8.9% = 18.7 7.9 89.2 . (b) Dropping from 100 to 81.3 is a 18.7 point decrease, or 18.7% = 100 . . 3,395,867 1,964,926 164. For 1988, 3,395,867 100 = 100 (of course). For 1995, 3,395,867 100 = 57.9. For 1997, . 1,941,870 3,395,867 100 = 57.2. The 43.8 point decrease from 1988 to 1997 is a 43.8% decrease. 165. (a) We know New York prices rose faster because the New York CPI is greater than the L.A. CPI. (b) We do not know how prices compared in the base period. As an illustration, if prices in Los Angeles were twice as high as New York prices in the base period, they would be 1.87 times as high in January 2000. 166. The 1994 quantities are not relevant for a xed market basket index. The 1990 and 2000 costs for the 1990 market basket are in the table on the right. The Food Faddist Price Index (1990 = 100) for 2000 is . 1809 therefore 1491 100 = 121.3. 167. The 1995 quantities are not relevant for a xed market basket index. The 1985 and 1995 costs for the 1985 market basket are in the table on the right. The 1995 Guru Price Index (1985 = 100) is therefore . 94.55 100 = 142.3. 66.45
1990 $1090 $ 147 $ 254 $1491 2000 $1318 $ 159 $ 332 $1809
Steak Rice Ice Cream TOTAL
Olive oil Loincloth Atharva Veda TOTAL
1985 $50.00 $ 5.50 $10.95 $66.45
1995 $76.00 $ 5.60 $12.95 $94.55
168. About $1.04 million: The CPI in 1920 was 20.0, and in 1999 it was 166.6, so in 1999 166.6 . dollars, $125,000 is worth $125, 000 20.0 = $1, 041, 250. 169. About $38,300: The CPI in 1995 was 152.4, and in 1999 it was 166.6, so in 1999 166.6 . dollars, $35,000 is worth $35, 000 152.4 = $38, 261. 1610. About $42,000: The CPI in 1970 was 38.8, and in 1998 it was 163.0, so in 1998 163.0 . dollars, $10,000 is worth $10, 000 38.8 = $42, 010.
26.8 . 1611. In 1999, $100 was equivalent to $100 166.6 = $16.08 in 1955 dollars. Assuming the CPI continues to increase, the 1955 price of a modern microwave based on more recent CPI data will be less than $16a striking decrease from the $1300 cost of a 1955 microwave.
1612. To compare real incomes, we translate all three amounts to the same year. We choose to express all in 1999 dollars. Of course, Tiger Woods winnings are worth . $6,620,970. Sam Sneads 1938 winnings amount to $223, 274 166.6 = $2.66 million 14.0 in 1999. (This will vary slightly depending on how students deal with the fact that the 1938 CPI is not given. If a student uses 13.7, Snead earned 2.72 million 1999 dollars.) . Tom Watsons 1980 winnings amount to $1, 041, 002 166.6 = $2.10 million in 1999. 82.4 Sneads and Watsons earnings are very similar, while Woods earned 2.5 to 3 times as much (in real terms) as the other two. 1613. DiMaggios 1940 salary of $32,000 would have been worth about $55,100 in 1950: . 24.1 . 32, 000 14.0 = 55, 086. DiMaggios real income increased by 100,00055,100 = 0.80 = 80%. 55,100 1614. To compare real costs, we must either express 1976 dollars in terms of 1999 dollars, or vice versa (or translate them both to some other year). A $12 call in 1976 would cost . $12 166.6 = $35.14 in 1999. The real cost of a call to London decreased by about 69%: 56.9 . 35.1411 35.14 = 68.7%. . 1615. $5900 in 1976 is about $17,300 in 1999: $5900 166.6 = $17, 275 in 1999. Since the 56.9 actual cost in 1999 is greater than the 1976 cost adjusted to 1999 dollars, we know that the cost of going to Harvard increased faster than consumer prices in general. 1616. The table is below; the plot is on the $5 right. The adjustment to constant 1960 $4 dollars is done by taking each minimum Actual wage value times 29.6 (the CPI for 1960) $3 then dividing by the CPI number for that $2 year. The solid line in the graph shows that 1960 dollars $1 the minimum wage has risen fairly steadily $0 since 1960, but the dashed line indicates 1960 1965 1970 1975 1980 1985 1990 1995 that, after adjusting for ination, the real Year minimum wage rose only slightly at rst, and has fallen in recent years, nally showing a slight rise in 1999. The minimum wage did not keep up with ination beginning in the early 1980s, but it has recently begun to move toward providing buying power equal to the 1960 minimum wage.
Year Min. wage 1960 dollars 1960 $1.00 1.000 1965 $1.25 1.175 1970 $1.60 1.221 1975 $2.10 1.155
1980 $3.10 1.114
1985 $3.35 0.922
1990 $3.80 0.861
1995 $4.25 0.825
1999 $5.15 0.915
The Consumer Price Index and Government Statistics
1617. The table is below; the plot is on $3500 the right. The adjustment to constant Actual $3000 1981 dollars is done by taking each $2500 tuition value times 90.9 (the CPI for $2000 1981) then dividing by the CPI number 1981 dollars $1500 for that year. $1000 The solid line in the graph shows $500 that tuition has risen fairly steadily $0 since 1981; the dashed line indicates 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 that, after adjusting for ination, real Year tuition has also risen, though not as quickly as the actual dollar amounts. In other words, tuition is rising faster than average ination: The real cost of Purdue tuition . has increase about 71% from 1981 to 1999, since 19771158 = 70.7%. (This problem will 1158 be more interesting if you substitute your own institutions tuition.)
Year Tuition 1981 dollars 1981 $1158 1158 1983 $1432 1307 1985 $1629 1376 1987 $1816 1453 1989 $2032 1490
Purdue University tuition
1991 $2324 1551
1993 $2696 1696
1995 $3056 1823
1997 $3336 1889
1999 $3624 1977
82.4 . 1618. A dollar in the year 2000 is worth $1 168.7 = $0.488 in 1990 dollars. (Of course, we neednt carry out the whole computation; we really only need to note that the CPI more-than-doubled from 1990 to 2000, so average consumer prices more-than-doubled, so $1 buys less than half as much as it used to.)
1619. A $3852 increase is 11% of $35,033: . 30% of $101,875: 132,199101,875 = 29.8%. 101,875
. = 11.0%. A $30,324 increase is
1620. If the increase in cost is seen as paying for higher quality cable service, it does not count toward an increase in the CPI. Note: More information about issues like this can be found on the World Wide Web at http://stats.bls.gov/cpihome.htm. 1621. The weight reects the dierence in the cost (owning a home may be more expensive than renting), and also the fact that more people own than rent. 1622. (a) The CPI market basket is intended to represent the purchases of people living in urban areas. A person living on a cattle ranch would have a dierent set of needs, and would make dierent purchases than an urban consumer. (b) Using a wood stove and no air conditioner makes Jims expenditures for utilities quite dierent from a typical urban consumer. (c) Luis and Marias medical expenses would be considerably higher than for a typical consumer, so that portion of the CPI would not reect the increase in how much they spent in the last year.
1623. Those receiving the wages or payments might be aected by seasonal variations for example, they might have higher heating costs in the winter. Using the unadjusted CPI means that the payments they receive will be escalated to keep up with these seasonal variations. Note: The BLS web site gives a more detailed rationale for using the unadjusted CPI. 1624. The local CPIs are based only on sample prices from the area in question. That is, they are statistics based on smaller samples than the national CPI, and smaller samples as usual bring greater sampling variability. 1625. Saying that poverty level is getting higher in real terms means that the poverty level is going up, even after adjusting for ination. 1626. (a) The 33rd percentile of the income distribution is the income such that 33% of all workers make less than that amount. (b) Real wages means adjusted for ination. This report says that the 1990 33rd percentile, 66th percentile, and 99th percentile are (respectively) 14% less than, 6% less than, and 1% greater than those same percentiles in 1980, expressed in 1990 dollars. 1627. Larger sample sizes give more information; more information means less uncertainty, hence less variation and more accuracy. 1628. By asking the same questions, the GSS can be used to track how the population is changing over time. 1629. (a) Numbers of crimes reported, by type. Arrests and convictions, by type of crime. Dollar loss due to crimes against property. (b) Questions such as: Have you been a victim of a crime in the past year? What type of crime? When? Did you report it to the police? Surveys of this kind show higher crime rates than police data, and suggest that many crimes are not reported to the police. (c) Questions such as: Have you changed your habits due to fear of crime? Do you feel safe walking near your home at night? Should more money be spent on law
Part II Review
Part II Review Solutions
II1. The distribution is somewhat right-skewed, and has no outliers.
8 9 10 11 12 13 14 15 16 2468 1336 13 0224 277 1 335 79 457
II2. Histograms may vary slightly from the one shown here. The distribution is somewhat irregular in shape; it is not particularly symmetric, nor is it skewed. The ve quarterbacks with more than 4000 yardsJohnson, Favre, Manning, Warner, and Beuerleinstand apart from the rest, but are not really outliers. (Arguably, Drew Bledsoe, with 3985 yards, belongs in this group, too.)
8 7 6 5 4 3 2 1 0 1000 1500 2000 2500 3000 3500 4000 Passing yards
II3. The ve-number summary is Min = 8.2%, Q1 = 9.3%, M = 11.3%, Q3 = 14.3%, Max = 16.7%. II4. The ve-number summary (all in yards) is Min = 1276, Q1 = 2117, M = 2670, Q3 = 3389, Max = 4436. . II5. The mean of these 26 states is x = 11.981%. Removing Mississippi will decrease the mean, since the percent for Mississippi is higher than the mean. Without Mississippi, . the mean is x = 11.792%. II6. (a) The middle 95% of head circumferences lie within two standard deviations of the mean: 22.8 2.2 = 20.6 to 25.0 inches. (b) 23.9 inches is one standard deviation above the mean, so this is 16% (half of the outer 32%). II7. (a) For normal distributions, the median and mean are equal, so the median score is 500. (b) 68% of all scores lies between 400 and 600 (within one standard deviation of the mean). II8. It is reasonable to expect that students with a high GPA end up with higher scores on the nal exam, and similarly, low-GPA students will typically score lower on the nal. Of course, we expect some exceptions to this. The correlation r measures how strong this relationship is; the closer r is to 1, the more the scatterplot of GPA and exam score looks like a straight line.
II9. (a) Mean tail length is measured in centimeters. (b) The rst quartile of tail length is measured in centimeters. (c) The standard deviation of tail length is measured in centimeters. (d) Correlation has no units. II10. (a) Large mice typically have longer tails, and small mice have shorter tails. (b) 9.8 cm is about 3.9 inches. (Divide by 2.54.) (c) r would not change; it is unaected by the units used. II11. The dolphin has approximate body weight (mass) 180 kg, and brain weight 1600 g. The hippo has approximate body weight 1400 kg, and brain weight 600 g. II12. The fact that the dolphins point lies so close to the point for humans would make this an attractive conclusion. Also, dolphins, humans, and elephants lie noticeably above the regression line, suggesting that the ratio of brain weight to body weight for these animals is much higher than for most other animals, which seems like it could be an indicator of intelligence. Hippos lie below the line, meaning that they have relatively small brains in their large bodies. II13. (a) The correlation would decrease, because relative scatter about the line is greater with the elephant removed. (That is, if we magnify the lower left quarter of the graph, the scatterplot looks considerably less like a line.) (b) Without dolphins, hippos, and humans, the scatterplot looks more linear, so the correlation would increase. II14. The straight-line relationship between body weight and brain weight explains about 74% (r2 = 0.7396) of the variation in brain weight; that is, if we know body weight, we can make a fairly reliable prediction of brain weight using this regression line. II15. We predict a brain weight around 800 grams: Starting from a 600 kg body weight, go up to the line, and over to the vertical axis. II16. The slope must be 1.3. The line goes from about (0, 0) to about (2800, 3800), which can only arise from a slope like 1.3. II17. The association is negative (as day increases, weight decreases), so r should be negative. The plot shows a very strong linear relationship, so r should be close to 1. . Note: In fact, r = 0.998. The scatterplot includes the regression line for Exercise II-18.
120 Weight (grams) 100 80 60 40 20 0 0 5 10 Day 15
Part II Review
II18. (a) On the average, the weight of the soap decreases by about 6.31 grams each day. . (b) We estimate the weight of the soap on day 4 as 133.2 6.31 4 = 108 grams. (c) See the solution to Exercise II-17. II19. For day 30, we predict 133.2 6.31 30 = 56.1 grams, which of course makes no sense. Using a regression line for prediction outside the range of the available data is risky. II20. About $61,400: A 1980 income of $30,000 is equivalent in buying power to 30, 000 168.7 . 82.4 = $61, 420 in the year 2000. . II21. About $43,990: A 1981 price tag of $24,000 is equivalent to 24, 000 166.6 = $43, 987 90.9 in 1999 dollars. . II22. Expressing the 1976 cost in 1999 dollars gives 13, 500 166.6 = $39, 527. The actual 56.9 1999 cost is more than twice this amount, so the cost of a Steinway has gone up in real terms. II23. The table of adjusted gold prices is below; the plot is on the right. The adjustment to constant 1983 dollars is done by taking each gold price times 99.6 (the CPI for 1983) then dividing by the CPI number for that year. The line graph shows that the price of gold initially uctuated, but has decreased in real terms since 1993 (and in actual price since 1995). An ounce of gold in 1999 was worth only $176 in 1983 dollarsless than half of its 1983 price. Year Gold price 1983 dollars 1983 385 385 1985 329 305 1987 486 426
$500 Gold price per ounce $400 $300 $200 $100
1983 dollars Actual
$0 1983 1985 1987 1989 1991 1993 1995 1997 1999 Year
1989 403 324
1991 354 259
1993 391 270
1995 392 256
1997 368 228
1999 295 176
II24. (a) Since a person cannot choose the day on which he or she has a heart attack, one would expect that all days are equally likelyno day is favored over any other. While there is some day-to-day variation, this expectation does seem to be supported by the chart. (b) Monday through Thursday are fairly similar, but there is a pronounced peak on Friday, and lows on Saturday and Sunday. Patients do have some choice about when they leave the hospital, and many probably choose to leave on Friday, perhaps so that they can spend the weekend with the family. Additionally, many hospitals cut back on stang over the weekend, and they may wish to discharge any patients who are ready to leave before then.
II25. (a) The histogram (below, left) shows the distribution to be roughly symmetric. There are no clear outliers, but in looking at the list, 10.17 and 9.75 seem to be unusually high, and 6.75 is extraordinarily low. (b) In the time plot (below, right), the outliers stand out more clearly.
12 Time (minutes) 10 Frequency 8 6 4 2 0 6.5 7 7.5 8 8.5 9 9.5 Drive time (minutes) 10 10 9.5 9 8.5 8 7.5 7 6.5 0 5 10 15 20 25 Day
30 35 40
. . II26. With the outliers removed, we nd x = 8.36 min and s = 0.4645 min. II27. The mean is higher, because the distribution of house prices would be right-skewed, which pulls the mean up (to the right of the median). II28. A histogram is shown, but a stemplot would work as well. The distribution is reasonably symmetric, with no particular outliers, although there is a gap in the high 30s: Six states had percentages less than 37%, and the rest were above 40%. . The mean and standard deviation are x = . 47.2% and s = 6.775%. Although the mean and standard deviation are preferable for this distribution, some students might instead nd the ve-number summary:
14 12 Frequency 10 8 6 4 2 0 30 35 40 45 50 55 Percent voting for Clinton 60
Min = 33.3%, Q1 = 43.8%, M = 47.75%, Q3 = 51.7%, Max = 61.5%. II29. (a) Fidelity Technology Fund is more closely tied to the stock market as a whole, since its correlation is larger. (b) No: Correlations tell nothing about the absolute size of the variables, only the relative sizes (above/below average).