**Unformatted text preview: **For questions 1 26, there is only one correct answer from those given. Mark your answer to each question
with a pencil on the sheet provided. Ambiguous responses will be considered incorrect. 1. If two lists of numbers have exactly the same mean of 30 and the same standard deviation of 5, then
the percentage of numbers between 25 and 35 must be exactly the same for both lists.
(a) True
(b) False 2. If the slope of a regression line is zero, then the correlation coecient is zero. Likewise, if the slope of
the line is one, then the correlation coecient is one.
(a) True
(b) False 3. I calculate the residuals from a regression line predicting weight from height. I nd that all the residuals
are negative. This is impossible; I must have made a mistake in my calculations.
(a) True (b) False 4. In drawing a sample from a population, the sampling variability increases with sample size.
(a) True
(b) False 5. A 90% condence interval for a population proportion
500 and is found to be
condence interval for p 0.48 ± 0.04. p is constructed from a random sample of size If another random sample of the same size is drawn, the 90% constructed based on this new sample will have a 90% chance of including the value 0.48.
(a) True
(b) False 6. Farmers concerned with the aect of snowfall levels on their crops found that there wasn't sucient
evidence of a decrease in crop growth. They based this conclusion on a hypothesis test using
They would have made the same decision at
(a) α = 0.01. True (b) False 1 α = 0.05. 7. Consider the following histograms of two data sets (both contains 80 observations): Which of the following statements is (are) true about the two data sets?
(1) Data 1 has a smaller median that data 2.
(2) Data 1 has a larger standard deviation than data 2.
(3) Data 1 has a smaller IQR than data 2.
(4) The range of the two data sets is approximately the same. (a) (3) only (b) (1) and (2) only
(c) (1) and (3) only
(d) (1) and (4) only 2 8. For which of the 4 plots below will you expect the corresponding residual plot to be patternless? (a) Y1 * X1 only
(b) Y4 * X4 only
(c) Y1 * X1 and Y2 * X2
(d) Y3 * X3 only
(e) Y1 * X1 and Y4 * X4 9. Data were collected within a group of males in an athletic association in BC. Based on this dataset, a
regression model was computed to predict weight Y (in kg) from height X (in cm). The model tted was Y = 0.28X + 27.00.
If intending to predict height from weight using the same dataset, which of the following statements
most precisely describes what you can say about the appropriate regression line?
(a) The slope of the regression line would be 0.28.
(b) The slope of the regression line would be 27.00
(c) The slope of the regression line would be positive. (d) The slope of the regression line would be negative.
(e) The slope of the regression line would be 3.57. 10. A study of the survival of food-and-drink businesses obtained a sample from the telephone directory's
Yellow Page listings of food-and-drink businesses in Greater Vancouver. The investigator of this study
rst drew a simple random sample of 4 cities in Greater Vancouver. Then within each selected city, he
randomly sampled 50 businesses. For various reasons, the study got no response from 39.5% of the 200
businesses chosen. Interviews were completed with 121 businesses that responded.
(1) The population of interest to the investigator is
(a) all food-and-drink businesses in Greater Vancouver that are listed under the telephone directory's Yellow Page.
(b) all food-and-drink businesses in Greater Vancouver. (c) the 200 businesses that were chosen by the investigator. 3 (d) the 121 businesses that responded.
(2) What is the statistic?
(a) The proportion of all food-and-drink businesses in Greater Vancouver listed under the telephone directory's Yellow Pages that failed within 3 years.
(b) The proportion of all food-and-drink businesses in Greater Vancouver that failed within 3
years.
(c) The proportion of the 200 businesses sampled by the investigator that failed within 3 years.
(d) The proportion of the 121 businesses that responded and failed within 3 years. (3) The sampling method that the investigator employed in choosing the 200 businesses is
(a) simple random sampling
(b) stratied random sampling
(c) multistage sampling (d) systematic sampling 11. An education researcher was interested in examining the eect of teaching method and the eect of the
particular teacher on students' scores on a reading test. In a study, there were two dierent teachers
(Juliana and John) and three dierent teaching methods (method A, method B, and method C). Two
hundred and fty students were randomly assigned to a teaching method and teacher.
(1) For this experiment, identify the response variable.
(a) Teaching method
(b) The education researcher
(c) Teacher
(d) Score on reading test (2) Identify the factors in the study.
(a) Teaching method and teacher (b) Juliana and John
(c) Juliana and method A, Juliana and method B, Juliana and method C, John and method A,
John and method B, John and method C
(d) Method A, method B and method C (3) Which of the following is a correct statement about the study?
(a) This is a completely randomized design. (b) This study uses a placebo.
(c) This study cannot justify a cause-and-eect relationship.
(d) All of the above. 4 (4) The researcher wants to compare the scores on the reading test between dierent treatment groups.
Which of the following four display(s) is (are) appropriate for the comparison?
(1) contingency table
(2) side-by-side boxplots
(3) scatterplot
(4) bar graph (2) only (a) (b) (2) and (4) only
(c) (1) and (3) only
(d) (2), (3) and (4) only 12. Consider sampling with replacement from a large population of people.
variance of IQ is denoted σ2 . The sample size is n, Within the population, the and the mean of the sample IQs is found. The variance of this sample mean is
(a) σ2 provided (b) σ2 for any value of (c) σ 2 /n (d) σ 2 /n for any value of n.
√
σ 2 / n for any value of n. (e) n is large. provided n n. is large. The next four questions (Q13 - Q16) refer to the following situation.
A politician must decide whether or not to run the next local election.
do so if more than 30% of the voters would favour his candidacy. He would be inclined to The results of a poll of 225 local citizens showed that 81 favour the politician. Should the politician decide to run the election based
on the results of this survey? Carry out an appropriate hypothesis test at α = 0.01 to answer this question. 13. What is the parameter of interest?
(a) 30%
(b) The proportion of citizens who favor the politician among the poll of 225 local citizens.
(c) 36%
(d) The proportion of all citizens who favor the politician. 14. Denote p as the parameter of interest. What are the correct hypotheses? (a) H0 : p = 0.36 v.s HA : p > 0.36 (b) H0 : p = 0.30 v.s HA : p = 0.30 (c) H0 : p = 0.30 v.s HA : p > 0.30 (Correct) 5 (d) H0 : p = 0.36 v.s HA : p < 0.36 15. The test statistic is
(a) 1.88
(b) -1.88
(c) 1.96 (d) -1.96 16. Which of the following is correct?
(a) We reject the null hypothesis and advise the politician to run the election.
(b) We do not reject the null hypothesis and advise the politician not to run the election. (c) We accept the alternative hypothesis and advise the politician to run the election.
(d) We do not accept the alternative hypothesis and advise the politician not to run the election. Use the following information for questions 17 and 18: Researchers are interested in determining if, during an exam period, SFU undergraduates tend to sleep
more than UBC undergraduates. Ten SFU undergraduates were chosen at random and, independently,
ten UBC undergraduates were chosen at random. A data le was constructed consisting of a line for
each student containing:
ID: student ID,
SCH: School attended (SFU/UBC),
APR16: number of hours slept during the period on April 16 from 12:01 am to 11:59 pm,
APR17: number of hours slept during the period on April 17 from 12:01 am to 11:59 pm. 17. True or false? A good way to study the primary research question is to make a scatterplot of UBC
students' numbers of hours slept Apr 16 on the
16 on the y x axis and SFU students' numbers of hours slept Apr axis. (a) True
(b) False 18. To test the null hypothesis that SFU undergraduates and UBC undergraduates tend to sleep the same,
on average, during exam period, we would need which one of the following?
(a) the t model with 8 degrees of freedom.
(b) the t model with 19 degrees of freedom.
(c) the t model with 9 degrees of freedom. (d) the binomial model. 6 Use the following information for questions 19, 20 and 21: The owner of a small clothing store is concerned that her average sales each day are only $149, not
enough to cover rent and salary. She decides to try out some new window displays, to see if these will
increase her average sales. She buys the new window displays on trial. To decide if she should keep the
new displays, she collects sales data for 20 days to test the null hypothesis that the daily expected sales
are unchanged (equal to $149) versus the alternative hypothesis that expected daily sales are greater
than $149. 19. Suppose that the displays really do work. If the store owner extends her trial period from 20 days to
30 days, which statement most precisely describes what can be said about the chance of committing a
type II error?
(a) The chance of committing a type II error would increase.
(b) The chance of committing a type II error would stay the same.
(c) The chance of committing a type II error would decrease. (d) The chance of committing a type II error would remain zero.
(e) The chance of committing a type II error could be chosen to be 5%. 20. Suppose that, based on the data collected in the trial, the owner calculates a P-value of 0.04. This means
(a) there is a 4% chance that sales increased during the trial period.
(b) there is a 4% chance that sales decreased during the trial period.
(c) during the trial period, sales increased by 4%.
(d) during the trial period, sales decreased by 4%.
(e) during the trial period, her sales gures were pretty high, if indeed the new displays
typically would have no eect. 21. Suppose that, based on the data collected in the trial, the owner of the store decides to keep the new
displays. Then
(a) she is in danger of making a Type I error. (b) she is in danger of making a Type II error.
(c) she is in danger of making a Type III error.
(d) she will get a bigger α. (e) she will get a smaller α. 7 22. Three dierent labs tested two types of cream, A and B, recording the percentage of solubility in some liquid. Each lab repeated each experiment, and the data are given below:
Cream type A
6.8, 6.6 5.3, 6.1 2 7.5, 7.4 7.2, 6.5 3 Lab B 1 7.8, 9.1 8.8, 9.1 Dierences in the measurements may be due to dierences in solubility in the cream types, dierences
between the labs or both of these possible sources of variation. To investigate this, you could use
(a) Binomial model.
(b) matched pairs t test.
(c) two-proportion z-test
(d) linear regression model.
(e) No method that has been encountered in STAT 200. For questions 2325, consider studying if gender and the highest academic qualication obtained (none,
high school diploma, bachelor's degree, post-graduate degree) are independent.
23. True or false? To study independence of gender and the highest academic qualication obtained, it would be useful to construct side-by-side boxplots.
(a) True
24. True or false? (b) False(correct answer) To study independence of gender and the highest academic qualication obtained, it would be useful to calculate a correlation coecient.
(a) True
25. True or false? (b) False(correct answer) To study independence of gender and the highest qualication obtained, it would be useful to compare the four conditional distributions: • the conditional distribution of gender given no qualication was obtained, • the conditional distribution of gender given the highest qualication is high school diploma, • the conditional distribution of gender given the highest qualication is a bachelor's degree, • the conditional distribution of gender given the highest qualication is a post-graduate degree.
(a) True(correct answer) (b) False 26. Every day Lucky Louie plays a die roll game. He rolls a die ve times and counts the number of ones.
If he rolls exactly two ones, then he treats himself and buys a Barstucks Macchiato. That is the only
way he treats himself. Let X be the number of Macchiatos Lucky Louie buys in the month of June, a month with thirty days. Then
and X has a Binomial model dened by two parameters, denoted as usual p. (a) What is the value of n here? (circle the correct one)
2 (b) What is the value of 1
6 p? 5 10 30(correct answer) (circle the correct one) ( 1 )2
6 1 2 5 3
5 C2 ( 6 ) ( 6 ) (correct answer) 8 1 2
5 C2 ( 6 ) n 27. In this class last year, there were 211 students who wrote both the midterm exam and nal exam.
Below you can nd summary statistics for the 211 students who wrote both exams. • The mean grade for the midterm was 80% • The standard deviation for the midterm grade was 16% • The mean grade for the nal exam was 73% • The standard deviation for the nal exam grade was 12% • The correlation between the two exam grades was 0.63 (a) What nal exam grade would you predict for a student who scored 65% on their midterm? = midterm grade, Y = nal exam grade
¯
x = 80%, sx = 16%, y = 73%, sy = 12%
¯
sy
12
b1 = r × ( sx ) = 0.63 × 16 = 0.4725 , and b0 = y − b1 x = 73 − 0.4725 × 80 = 35.2%
¯
¯
So, y = b0 + b1 x = 35.2 + 0.4725x
ˆ
when x = 65%,
y = 35.2 + 0.4725 × 65 = 65.9125%
ˆ
Let X Given, That is, we predict a nal grade of 65.9125% for a student who scored 65% on the midterm. (b) Write an interpretation of the slope in the context of this example.
For every 1% increase in the midterm grade, we expect a 0.4725% increase in the student's nal
exam grade. (c) Suppose that another students' grades were added to the dataset. This student scored a 10% on
the midterm and 92% on the nal exam. Would this new observation aect the correlation, and
if so, how? This new observation would decrease the correlation. This is because it deviates largely from the
regression line and is likely to increase scatter.
28. Suppose that Math Prociency scores of 12th graders are Normally distributed with mean 80 and
standard deviation 12. What is the probability that the average Math Prociency score of a random
sample of 400 students is between 80 and 81.2?Let We have y
¯ Y be the Math Prociency score of 12th graders. Y ∼ N (80, 12) is the average Math Prociency score of a random sample of 400 students So, 12
) ⇐⇒ N (80, 0.6)
y ∼ N (80, √
¯
400 P (80 < y < 81.2) = P ( 80−80 <
¯
0.6 y −80
¯
0.6 < 81.2−80
0.6 ) = P (0 < Z < 2) = 0.475 (By the 68-95-99.7 rule) 29. Some people believe in dowsing, the ability to be able to detect unseen water with the aid of a forked
stick. Dowsers claim that they experience a movement in the stick when it is passed over water, even
when that water cannot be seen or otherwise detected.
In a study to investigate the possibility of dowsing, researchers obtained eight subjects who claimed to
have the ability to dowse. The researchers took twelve identical containers, and placed half a litre of
water in six of them. The other six were empty, and the containers were such that it was impossible
to determine their contents by visual inspection alone. The twelve containers were placed in a random
order in a room. The dowsers entered the room onebyone, and attempted to determine which of the
twelve containers held water using only their supposed dowsing powers. The dowsers were not told how
many of the twelve containers actually contained water, and nor were they told whether their choices
had been correct. The researchers recorded the number of times each dowser was correct.
(a) Why did the researchers only allow the subjects to attempt the task one at a time? The dowsers would probably inuence each other in their decisions, the results then no longer being
independent. 9 (b) Briey explain why this experiment was not doubleblind. It is not possible for the experiments to be blind to the knowledge of which containers held water.
(or note that there is no possible ambiguity in the scoring of the dowsers, so no argument for the
investigators being blinded.)
(c) One of the eight dowsers successfully determined the presence or absence of water in all twelve
containers. This proves false the hypothesis that no-one has the genuine ability to dowse for water.
True or false? (Circle one)
True False(correct answer) 30. A manufacturing company has 2 dierent instruments they use to measure the Rockwell hardness of
an object. They believe that one of the instruments may not be working properly, and giving readings
that are not completely accurate. To test this, they do the following. They take a large sheet of metal,
and cut it into 60 dierent pieces, and randomly divide the pile in two. They believe it is safe to assume that the hardness of the metal is the same for any particular piece cut from the same sheet of
metal. They measure the Rockwell hardness of 30 randomly selected pieces of metal using instrument
1 and nd a sample mean hardness of 45.8 with a sample standard deviation of 1.2. They measure the
hardness for the other 30 pieces of metal using instrument 2 and nd a sample mean hardness of 46.2
with a sample standard deviation of 1.1. (a) Make and interpret a 95% condence interval for the dierence in the mean hardness readings
using instrument 1 and instrument 2. Given n1 = 30, SE(¯1 − y2 ) =
y
¯ y1 = 45.8,
¯
s2
1
n1 + s2
2
n2 = s1 = 1.2
1.22
30 + , and n2 = 30, y2 = 46.2,
¯ s2 = 1.1 1.12
30 = 0.2972
t∗ = 2.045
29 df = min(n1 − 1, n2 − 1) = 29 , and
(¯1 − y2 ) ± t∗ SE(¯1 − y2 ) = (45.8 − 46.2) ± 2.045 × 0.2972 =⇒ (−1.0078, 0.2078)
y
¯
y
¯
29
Therefore, we are 95% condent that the true mean hardness readings determined by instrument
1 is between 1.0078 lower and 0.2078 higher than the true mean hardness readings determined by
instrument 2. (b) Based on this interval alone, would you reject or fail to reject the null hypothesis that the mean
hardness readings are equal for the two measurement instruments? Make sure to justify your answer with a reason. Based on this interval, we would fail to reject the null hypothesis because zero is within the interval. (c) Complete a hypothesis test to decide if you think that the two measurement instruments result in
signicantly dierent readings in the hardness of an object. Use a signicance level of 5%. Let µ1 be the true mean hardness reading using instrument 1, and
reading using instrument 2. Here, we test H0 : be the true mean hardness µ1 = µ2 HA : µ2 µ1 = µ2 . against SE(¯1 − y2 ) = 0.2972.
y
¯
y1 −¯2
¯ y
45.8−46.2
is t0 = SE(¯ −¯ ) = 0.2972 = −1.346
y1 y2 From part (a),
Test statistic df = min(n1 − 1, n2 − 1) = 29
2P (t29 > |t0 |), and 0.05 < P (t29 > 1.346) < 0.1 , so 0.1 <P-value< 0.2.
Since P-value > α = 0.05, we fail to reject H0 and conclude that there is not enough
P-value = evidence to say the true mean hardness readings from the two instruments are signicantly dierent at 5%
signicance level. 10 31. What aects how a person chooses at random? Each of 92 randomly sampled university students was
given a slip of paper that said
Randomly choose one of the letters S or Q.
Of these 92 students, 61 chose S. The remaining 31 students chose Q. Another 98 randomly sampled
university students were given a slip of paper that said
Randomly choose one of the letters Q or S.
Of these 98 students, 45 chose S. The remaining 53 students chose Q.
Is there an association between how the students responded and the ordering of the letters in the
question? Carry out the appropriate test at level 0.05. Clearly show the calculation of your test statistic and your rejection rule (in particular, clarify which of the tabl...

View
Full Document