Harris recently installed a spam ﬁlter software, but he still saw spam emails in his
inbox. He made a daily record of the number of spam emails that were delivered to
his inbox over the past 20 days. The following is a frequency histogram for his data.
The frequency refers to the number of days. histogram of spam email data Frequ en cy
3 I_'—I—I_—I——I_l
20 40 60 80 100 120 # spam emails a) Harris also plotted a stemplot for the data. Which of the following is a correct
stemplot for his data? Check only one answer. [2 marks] V_/ Stemplot A _StemplotB _StemplotC A. 2 I 011355 B. 2 I 011355 C. 1 I 05
3 I 01467 3 I 01467 2 I 011355
4 I 12479 4 I 12479 3 I 01467
5 I 56 5 I 56 4 I 12479
6 I 8  O 5  56
7 I 10  5 6 
8 I O 7 I
9 I 8  O
10 I 5 b) Which of the following is a correct statement about the distribution of the spam
email data? Check only one answer. [2 marks] The distribution is roughly symmetric, and the mean is about the same as
the median. 1/1 The distribution is skewed, and the mean is larger than the median.
The distribution is skewed, and the mean is smaller than the median. c) What is the percentage of days over the past 20 days that Harris received fewer
than 50 spam emails? Check only one answer. [2 marks] __16% _19% “—% x loom : sciVD
_50% 1 ism d) What is the third quartile of the number of spam emails? Use the stemplot you
have chosen in part (a) to answer this question. Check only one answer. [3 marks] ____23 Upper hall]: 0? llrre data 56’? above ($37."
#3: 31m 4., 41.44%?» 4‘1; 5'5: Sb; 5301'05
—* Q  d' I 1m above lisl: : Ar1 + 4f? e) Identify any outliers in the data set using the stemplot you have chosen in part
(a); It is given to you that the IQR is 23. Show your work here. [4 marks] NO OLA‘HlQlZS in the lower end (distribution does not have, a long lGF‘rl'Clil)
(D3 +}.S>¢ICH? : 43 +1.5K 2.3:: %25 ms 7 82.5 , So 105 [5 an omllhgrlr {he @11le OLL+ll.Q.r‘ [kn Hate, dam f) Which of the following pairs of summary statistics best describe the center and
the spread of the number of spam emails received daily? Check only one answer
and explain brieﬂy. [4 marks] _ mean and standard deviation
__ mean and IQR i median and IQR _ median and variance Explain: BO‘H’l median and IQVZ are insensitive l1: cudliens,
So in HM Presence 0% Owlliers 005), We should vaport
‘l'heSe 2 summanj Statl‘SﬁCS, 2. An M&M’s chocolate fan is interested in studying the color distribution of the sugar
coating of the chocolate candies. He opens a bag of l\/1&M’s chocolate candies, and classiﬁes the candies according to the color of the coating. Check all statements that
are correct. [3 marks] V’ The color of the sugar coating is a categorical variable.
v” A bar chart can be used to display the distribution of the color variable. A side—by—side boxplot can be used to compare the number of yellow coated candies
and the number of red coated candies. 3. The hourly rates for highschool private tutoring follow the normal distribution with mean it and standard deviation 0. It is given that the middle 99.7% of all the hourly
rates fall between $13 and $43. Then (check your answers) a) the mean u is [1 mark] if equal to $28.
_ greater than $28.
__ less than $28. b) the standard deviation 0 is roughly equal to [2 marks]
_y_"_ $5.
$10. $15. 0) an hourly rate of $12 has [2 marks] a z—score of 0.
v/ a negative z—score.
a positive z—score. d) the IQR of the hourly rates is [2 marks] __ equal to $30.
reater than $30.
ess than $30. crq Z ,_a 4. Does how long children remain at the lunch table help predict how much they eat?
Twenty toddlers at a nursery school were observed. On each toddler, the number of
minutes he/ she spentat the table when lunch was served and the number of calories
that was consumed during lunch were measured. The two variables show a reasonably linear trend with a correlation coefﬁcient 7“ = —0.65. The summary statistics are given
as follows: # minutes spent at the lunch table : mean = 34, SD = 6.0 # calories consumed : mean = 456, SD = 30 a) Give a rough sketch of the scatterplot for the data. The axes have been set up for you. Remember to label the axes. Also indicate the mean—mean point (say)
on the scatterplot. [5 marks] a UNUYI‘E’S ~
consumed
I C— ”l
 ., . ,/ “”1
I . s"
45k: _~___ &~_.___ ‘. i
I ” I 
I.
I J i c
I I r
' ;
I :
 I——————————————> time Spent at
34 the. IHhch mble (minutes) b) Find the least—squares regression line that predicts the amount of calories con
sumed from the time stayed at the table during lunch. [6 marks] K = t’r minutes Spent or I‘m. lunch mble 8: II COLIOTIQS ccnsurﬂed giCL.‘I'bX
_ Y‘S‘ﬁ EOBBWECD .. _
’ s; : T‘IT‘ ” 335 OK: Q'rbx : 45b—(e3asm4) rereag c) Predict the number of calories consumed for a child who spends 25 minutes at
the table during lunch. [2 marks] 34:15 g: Slabﬁh 33‘3135) 3 49515 [:5 +146, predicted ﬁt calories Consume! d) For the following statements, check all that are correct. [4 marks] 65% of the variation in the number of calories consumed is explained by the
regression line. i The residual plot for this data set plots the residuals from the regression line
against the number of minutes spent at the lunch table for the twenty kids. One standard deviation (SD) increase in the number of minutes spent at the lunch table is associated with 0.65xSD increase in the number of calories
consumed. \/ If one changes the unit of the amount of time spent from minutes to hours,
the value of the correlation coefﬁcient 7‘ will remain unchanged, 5. The length of trout in a lake is normally distributed with mean a = 0.95 feet and an unknown standard deviation 0. If 60% of all trout are longer than 0.8 feet, what is the
value of a? [6 marks] 1: length 0? mm: 0.8 is the 40 an pgrcemlh‘le £~SC§F€ for 40 {in pﬁr'cenble : ~ 0,713 (or " 01156) 6. A survey was conducted in 11 countries to determine the percentage of teenagers who had smoked cigarettes and used marijuana. The scatterplot for the two variables is
shown below: LO 0
a. m‘ o
s2 O o
a m” 0
% Q—
E o
m (g:__ o o
g 0 (58,15)
.3 i9— .
E2—
0
0
Loﬂ—il—l—ﬁ—‘l  “‘1'“ "l .>
35 40 45 50 55 63 <95 qLo cigarette smoking (0/0)
a) The scatterplot shows a very strong positive correlation between the two variables. Does this imply cigarette/smoking leads to marijuana use? Justify your answer.
[5 marks] Yes V No Explain: ‘— ﬂssodaﬁmq does not L'mplg causation. Teenagers who smoke cigarettes are more likelg +0 ham ou+
WWW iFviendS who smoke both cigar‘eﬁ'es and marijuana.
They may be, inﬂuenced Lot} Jr‘neir ﬁIends to smoke marijuana. Peer {HHUEV'ME ('5 (J Conébundrng uaiable ‘Tl’iﬂ‘l— EXPICAMS the
GSSOCiQHUH between C'I‘gar'et‘le and marijuana Smokin g.
b) One more country participated in the survey, and the percentages of teenagers
who have smoked cigarettes and used marijuana were found to be 68% and 15%,
respectively. The correlation coefﬁcient 7" is then recalculated; How do the values of 7' before and after the inclusion of the new observation compare? Check only one answer and explain brieﬂy. [5 marks]
_ 7"(before) < T(after) < O L 0 < 7'(after) < r(before)
7"(before) < r(after) < 1 Explain: (ggﬂg) {5 cm ,‘nFlueHJﬂ‘ay Observohbh. W‘l‘thb‘ut' it, ‘Une
Cor‘r'eioh'ow is VEI‘Lj San, AHEr including if; ‘Ll’le
pattern 0? +ﬂ€ PoirﬁS becomes more SCCiﬁer’ed. The
Comata’n'w becomes weaker but is sh‘ll Positive, 7. You need to drive past two trafﬁc lights on the way from your house to the nearest
grocery store. The probability that you hit a red light is 0.5 at the first intersection
and 0.4 at the second intersection. The probability that you run into a red light at
both intersections is 0.25. On a random day you drive from home to that grocery store. Deﬁne the following events: E1 = you run into a red light at the first intersection
E2 = you run into a red light at the second intersection
E3 = you run into a green light at both intersections
E4 = you run into a red light at both intersections Which of the following statements is (are) true about the above events? Check all that
are correct. [4 marks] E1 and E2 are independent events.
E1 and E2 are disjoint events.
\/ E3 and E4 are disjoint events. _ E3 is the complement of E4. 8. In a university parking database with 5600 registered vehicles, records show that 43% of
the registered vehicles are Asian makes, 23% are European makes and the remaining
are American makes. Among all the 5600 cars, 20% ever received a parking ticket.
You randomly pick three vehicles with replacement from the database. What is the
probability that at most two of the three are American makes? [6 marks] L63“ X: #Americcm cwzs oui' 01C +413 3 chosen aarg Km Binﬁnrsf p:i—O,LJ(3»O.23 : o..3d,) Warmest :1 cans are AYnc.n‘cavi makes)
TPCXEZ) : P(x:o) + P()<=l_)a P[)§:2) 0, 3 \ ‘ '1 l
:(g) 0.34 uﬂogu) + (ﬂ U.%41L~b,%wz+(;)d.%4 10—039 3 028? + OAHLl "r 0122?;0‘C1wb OP. P{a’r maH Z COG are Americam mam)
= l* P [0” 3) are, ﬁrmer11w: r'l‘lal'LéJ) 31,.{_p..gLr_)5 : 1—0.0:3q ; OHM
Wu 1‘ 7 _ . . _l \
W ifidﬁpemlgmg bﬁCMJL Oi: ”d'rC’i'rWVrj LV]+l/1 replatarwawt J
cam WWW} malt/H120? fcthbm null. . 9. Two stores sell watermelons. At the ﬁrst store the melons weigh an average of 20
pounds with a standard deviation of 2.2 pounds. The melons are sold for 36 cents a
pound. At the second store the melons are smaller, with a mean of 17 pounds and
a standard deviation of 2 pounds. The store is having a sale on watermelons , only
25 cents a pound. Assume that the weights are normally distributed. Jenny selects a
melon at random at each store. Find the mean and the variance of the difference in
the prices Jenny pays for the two melons. [6 marks] L€+ X: weight 015 at melmi at w {it3+ store A “(20; 2.3)
“f: W'C’JI‘glrli OF 0L HLElEﬂ’I Ok‘l' +1“le grgfﬁgngi siC'r‘e n1 N( :4} 2. Diggemmjc in Pv'lCQLS D : 35.x — 267’ {\Mkosured [:1 cents) raw) : Etsex ; 25v) : seam _ 2:; am : attzoi— 950:1) .—“’_‘. my) : vtéex ~25Y> : ashix) + 25”“er : 12qe (229+ mm 2%) assuming,i><! ‘1' We 2 8??? gq CENTS 1 iﬂdﬁrpmdcnr. ...
 Spring '08
 KARIM

