**Unformatted text preview: **For questions 1 30, circle one answer only. If making an error, write your nal answer clearly in the margin.
Ambiguous responses will be considered incorrect.
1. The mean age of ve people in a room is 30 years. One of the people whose age is 50, leaves the room.
The mean age of the remaining people is 25 years.
T(correct) F 2. For a Normal distribution, the rst and third quartiles are respectively one standard deviation below
and above the mean.
T
F(correct)
3. A correlation coecient based on a scatter plot measures the proportion of data lying on the regression
line.
T
F(correct)
4. A phonein poll at a radio station concluded that 70% of Canadians approve of Stephen Harper, based
on the responses of 8,000 callers. The conclusion is valid since the sample size is large.
T F(correct) 5. A random sample of 100 students are asked if they are vegetarians. Five percent respond yes, and a 99%
condence interval for the true proportion of students who are vegetarians is found to be (0.03, 0.07).
This implies 99% of all random samples of 100 students will have a sample proportion that falls between
0.03 and 0.07.
T
F(correct)
6. Condence intervals are constructed for both parameters and statistics.
T F(correct) 7. You carry out a hypothesis test that compares the means of two populations. Suppose a p-value of 0.12
is obtained. This implies that there is a 12% chance that the two population means are equal.
T F(correct) 8. If the observations in a data set are all equal, then (a) the variance of the data set equals 0.
(b) the mean of the data set equals 0.
(c) the IQR of the data set equals 0.
(d) both (a) and (c).
(e) both (b) and (c). 1 9. At the start of this course, I hypothesized that the students in the pharmacy program might have a
higher mean grade than the students not in the pharmacy program. After this nal exam is completed
and marked, I will have nal grades for every student. To test my hypothesis, I should use. . . (choose
the best answer) (a) A one-sample t-test for the mean
(b) A two sample t-test for the mean
(c) A condence interval for the proportion
(d) The correlation coecient
(e) Linear regression
UBC is interested in nding out if students wish to continue the U-pass program. The researchers decide
that undergraduate and graduate students may have dierent opinions on such an issue. Undergraduate
students comprise 84% of the student population and graduate students make up the other 16%. The
researcher will survey 1000 students, and so she obtains a list of all UBC students. She divides them
into two lists, one of undergraduate students and one of graduate students. She then randomly samples
840 undergraduate and 160 graduate students to survey. She nds that 83% of those surveyed wish to
continue with the U-pass program. Use this information to answer questions 10 and 11.
10. What type of sampling technique was used? (a) Simple random sampling
(b) Stratied sampling
(c) Cluster sampling
(d) Systematic sampling
(e) Convenience sampling
11. What is the population parameter? (a) The distribution of students in an undergraduate of graduate program
(b) 83%
(c) The 1000 students surveyed
(d) The proportion of UBC students who wish to continue the U-pass program
(e) UBC students
12. Researchers at a university are interested in examining the relationship between one's smoking habits
and whether or not they have lung cancer. They randomly sample 312 people with lung cancer and
427 people without lung cancer. After interviewing the patients and analyzing the data, the conclude
that smoking is associated with having lung cancer. Which of the following is correct? (a) This is a prospective study
(b) This is a retrospective study
(c) The sample sizes must be equal to have a valid study
(d) There is no control group
(e) Both (b) and (c) 2 13. A simple random sample of 150 cars was taken to estimate the mean speed of cars driving on the
sea-to-sky highway, and resulted in the following 95% condence interval: (107,116). Circle the only
correct statement. (a) About 95% of cars in this sample were driving between 107 to 116km/hr
(b) A car found driving 120km/hr would be considered unusual
(c) We have violated our model assumptions by taking a simple random sample
(d) There is a 5% chance of making a Type II error
(e) None of the above
14. The national farming association has data on the amount of pesticide used per acre and the percentage
of fruit that has been contaminated by insects per acre, for 117 farms. There two variables have
a correlation coecient r = −0.8. For a farm that uses an amount of pesticide that is 2 standard
deviations above the mean amount of pesticide used, we would predict the percentage of contaminated
fruit will be (a) 1.6 standard deviations above the mean
(b) 1.6 standard deviations below the mean
(c) 2 standard deviations above the mean
(d) 2 standard deviations below the mean
(e) 1.28 standard deviations below the mean
15. The Vancouver police are interested in estimated in estimating the proportion of cars that are uninsured
within the city. They assume that uninsured cars are spread randomly and uniformly throughout the
city. They create a list of all major intersections in the city, and randomly select 10 inspect. They
set up roadblocks at each of the selected intersections and stop all cars passing to check for insurance.
What type of sampling technique was used? (a) Simple random sampling
(b) Stratied sample
(c) Cluster sampling
(d) Convenience sampling
(e) Systematic sampling
16. The slope of a regression line and the correlation are similar in the sense that (a) they both have the same sign.
(b) they do not depend on the units of measurement of the data.
(c) they both fall between 1 and 1 inclusive.
(d) neither of them can be aected by outliers.
(e) both can be used for prediction. 3 17. Which of the following is an incorrect statement about the correlation between two quantitative variables
X and Y ? (a) A correlation of 0.8 indicates a stronger linear association between X and Y than a correlation of
0.5. (b) A correlation of 0 implies X and Y are not related at all.
(c) A correlation of 1 indicates that Y = −X .
(d) Both (b) and (c).
(e) Both (a) and (b). 18. A certied tness coach wanted to test the eectiveness of a new tness program in reducing weight
among obese patients. Fifty female patients and fty male patients participated in the experiment.
Within each gender group, the patients were randomly assigned to one of the two tness programs
the new and the existing tness programs. Upon completion of the program, reduction in weight was
measured for each patient. Which of the following statements is incorrect about this experiment? (a) There are four treatments in the study.
(b) Gender is a blocking variable.
(c) The patients were not guaranteed to lose weight due to the experiment.
(d) Reduction in weight is the response variable.
(e) Type of tness program is the factor.
19. In testing a twosided hypothesis test for a mean, the test statistic was 2.12 which is expected to be a
value from the standard Normal distribution under the null hypothesis. The Pvalue of the hypothesis
test is (choose the most appropriate answer) (a) Between 0 and 0.025
(b) Between 0 and 0.05
(c) Between 0 and 0.003
(d) Between 0.95 and 0.997
(e) Between 0.05 and 1
20. If examining the plot of the residuals against the explanatory variable x for a linear model, when the
model ts well one would expect (a) the residuals to lie on a line of positive slope.
(b) the residuals to lie on a line of negative slope.
(c) the residuals to scatter about a line of positive slope.
(d) there to be no variation in the residuals.
(e) there to be no obvious pattern in the residuals. 4 21. In a large city, 37% of all restaurants accept both master and visa credit cards, and 50% accept master
cards and 60% accept visa cards. A tourist visiting the city picks at random a restaurant at which to
have lunch. Dene the following events: M
V = {the randomly chosen restaurant accepts master credit cards} ,
= {the randomly chosen restaurant accepts visa credit cards} . Are M and V independent?. (a) Yes.
(b) No.
(c) Insucient information to tell.
22. A type of thread is being studied for its tensile strength. Fifty-one pieces were tested under similar
conditions, the mean tensile strength being 78.30kg and the standard deviation being 5.60kg. (1) Give an approximate 95% condence interval for the mean tensile strength of the thread. (2) Assuming the strength of the thread follows the normal model with mean and SD having the values
given in the above, estimate the tensile strength that would be exceeded by 97.5% of such threads. (a) 78.30 ± 2 × 5.60
5.60
(b) 78.30 ± 2 × √51
(c) 78.30 ± 2.009 × 5.60
5.60
(d) 78.30 ± 2.009 × √51 (correct) (a) 67.10 kg
(b) 89.50 kg
(c) 76.73 kg
(d) 79.87 kg
(e) 61.50 kg
23. Eight marksmen, labeled A, B,. . . ,H, shot at targets with two types of rie. Their scores were as in the
table below: Rie Type 1
Rie Type 2
Dierence(Type1-Type2) A
93
89
4 B
99
93
6 C
90
86
4 Marksman
D E
F
87 85 94
92 78 90
-5
7
4 G
88
91
-3 H
91
87
4 sample mean
90.875
88.25
2.625 sample SD
4.45
4.77
4.27 (1) To perform the hypothesis testing, is there any assumption needed? (2) When testing the hypothesis that the ries are of equal quality, what is the test statistic? (a) No assumption is needed.
(b) Dierences are assumed to follow the t distribution.
(c) Dierences are assumed to follow the Normal distribution.
(d) Dierences are assumed to follow the Binomial distribution.
(a) 1.1381
(b) 1.7388
(c) 2.4452
(d) 2.4590
5 (3) What is the P-value for this hypothesis test? (4) Is there a signicant dierence between the two types of ries at the 10% signicant level? (a) Greater than 0.2
(b) Between 0.1 and 0.2
(c) Between 0.05 and 0.1
(d) Between 0.02 and 0.05
(a) Yes.
(b) No.
(c) There is insucient information to tell. 24. How well do the size and age of a house determine the annual tax house owners are paying? Nineteen
houses are randomly selected from a city. The age (# years since the house was built), house size
(measured in square feet of living space) and the amount of annual tax (in dollars) are recorded for
each of the 19 houses. Here are the summary statistics for two of the three variables:
house size : mean = 1456 sqft, standard deviation = 374 sqft
annual tax : mean = $1707, standard deviation = $323 (1) (2) The linear regression line that predicts the amount of annual tax from the house size has a slope
of $0.81 per square foot. Find the value of the correlation between the size of a house and the
amount of annual tax charged. (a) 0.94
(b) 0.69
(c) 0.70
(d) 0.95
(e) 0.91
Another linear regression line is tted to predict the amount of annual tax from the age of a house.
This regression line has a slope of -$92.4 per year. Based on the information that is available to
you in this question, which of the following is a correct statement? (a)
(b)
(c)
(3) The correlation between house size and amount of annual tax is stronger than that between
age of a house and amount of annual tax.
The correlation between house size and amount of annual tax is weaker than that between
age of a house and amount of annual tax.
The correlation between house size and amount of annual tax is the same as that between age
of a house and amount of annual tax. (d) There is insucient information to tell. Predict the annual tax paid by owners owning a 1500-squre feet house. (a) $1215.
(b) $2915.
(c) $1466.
(d) $1743. 6 25. A study investigated whether month of birth impacts on the time a baby learns to crawl. Parents with
children born in January, May or October were asked the age, in weeks, at which their child could crawl
one metre within a minute. The data are summarized below: Birth month Crawling
Mean
January 29.84
May
28.58
October 33.83 age
SD
7.08
8.06
6.93 size
34
29
40 Which of the following statements you consider to be correct? CHECK ALL THAT APPLY.
The table shown above is a contingency table.
It would be inappropriate to calculate correlations for these data. (correct)
This is a randomized block design experiment.
None of the above. 26. A multiple choice exam consists of 10 questions, each question having 4 possible answers to choose
from. Suppose a student has not studied for the exam, and will make completely random guesses at
the answer for each of the questions. (a) What is the probability that this student gets at least 3 answers in the exam correct?
1
Let X = # of correct answers, n = 10, p = 1 , X ∼ Bin(10, 4 )
4 P (X ≥ 3) 1 − P (X = 0) − P (X = 1) − P (X = 2)
1 3
1 3
1 3
= 1 −10 C0 ( )0 ( )10 −10 C1 ( )1 ( )9 −10 C2 ( )2 ( )8
4 4
4 4
4 4
= 1 − 0.0563 − 0.1877 − 0.2816
= = (b) 0.4744 Now consider an exam of 100 multiple choice questions with each question having 4 possible answers
to choose from. If the student will make completely random guesses at all of the answers, is it
usual that he gets at least 30 answers correct on the exam? Justify your answer probabilistically.
p = proportion of correct answers among the 100 questions, n = 100, and p = 1 Since np =
ˆ
4
1
100 × 4 = 25 > 10, n(1 − p) = 100 × (1 − 1 ) = 75 > 10, we can use normal approximation to
4 ˆ
sample proportion.p = 1 , σ(p) = p(1−p) = 1/4(1−1/4) = 0.0433, and p ∼approx N ( 1 , 0.0433)
ˆ
4
n
100
4
0.3−p
30
0.3−0.25
P (ˆ ≥ 100 ) = P (Z ≥ σ(p) ) = P (Z ≥ 0.0433 ) = P (Z > 1.15): between 0.025 and 0.16 by
p
ˆ
68-95-99.7 rule.
The probability isn't too low, so I think it is not too unusual. 27. ipods have been criticized for having a battery that doesn't last very long. I am interested in studying
a few things about the mean lifetime of a fully charged battery. I randomly sample and test 32 ipods
of the same model, and nd a sample mean lifetime of 6.15 hours with a sample standard deviation of
45 minutes. 7 (a) Suppose we want to re-estimate the true mean lifetime of the battery using a 95% condence interval with a margin of error no larger than 10 minutes. How large a sample should we take? Note that the t-score depends on the sample size, but the sample size is what we need to solve
for here. We will assume that the sample size is large such that the z-score and the t-score have
similar values. We will hence use the z-score which is constant (a value of 2 for 95% condence)
for our calculation.
Margin of error M E = 10 min, s = 45 min, z ∗ = 2
n= z∗s
ME 2 = 2 × 45
10 = 81 ipods So the sample size should be n = 81 ipods. (b) The company claims that the true mean lifetime of a fully charged battery is signicantly greater
than 6 hours. Test this claim using a signicance level of 5%.
y −µ0
¯
6.15−6
item H0 : µ = 6 hours, vs. HA : µ > 6 hours Test statistic is t0 = s/√n = 0.75/√32 = 1.131
df = n − 1 = 32 − 1 = 31, P-value = P (t31 ≥ 1.131) ≈ P (t30 ≥ 1.131), and 0.10 < P-value.
Since P-value > α = 0.05, we fail to reject H0 and conclude that there is not enough evidence to
say the true mean lifetime of a fully charged battery is greater than 6 hours at a signicance level
of 5%. 28. A study was conducted to determine whether an expectant mother's cigarette smoking has any eect
on the bone mineral content of her otherwise healthy child. A sample of 30 newborns whose mothers
smoked during pregnancy has a mean bone mineral content of 0.092 g/cm and a standard deviation
of 0.026 g/cm; a sample of 72 infants whose mothers did not smoke has a mean of 0.105 g/cm and a
standard deviation of 0.025 g/cm. (a) Do the data suggest that the population mean bone mineral content of newborns dier between
mothers who smoked and those who did not smoke during pregnancy? Use a signicance level of
α = 0.05. Dene clearly the parameter(s) and variable(s) that relate to your test.
Let y1 be the bone mineral content of a newborn whose mother smoked, y2 the corresponding
variable for a baby whose mother did not smoke. Let µ1 and µ1 be the respective means, σ1 and
σ2 their respective standard deviations. σ1 and σ2 are unknown. We test H0 : µ1 = µ2 against
HA : µ1 = µ2 . the test statistic is
t= y1 − y2
¯
¯
=
SE(y1 − y2 )
¯
¯ y1 − y2
¯
¯
s2
1
n1 + s2
2
n2 = 0.092 − 0.105
0.0262
30 + = −2.33. 0.0252
72 Under H0 this should be from the t29 distribution, since min (30 − 1, 72 − 1) = 29.
P-value = 2 × P (t29 > | − 2.33|), and 0.01 < P (t29 > | − 2.33|) < 0.025, so 0.02 < P-value < 0.05.
Since P-value < α = 0.05, we reject H0 and conclude there is a dierence between the underlying
means. 8 (b) Based on the results obtained from part (a), you can condently say that (circle all that apply): (c) Would you expect a 95% condence interval for the true dierence in the population means to
contain the value 0? (Circle one) (i) smoking causes a decrease in the bone mineral content in the newborns.
(ii) smoking is associated with the bone mineral content in the newborns.
(iii) smoking has no eect on the bone mineral content in the newborns.
(iv) smoking is independent of the bone mineral content in the newborns. Yes No(correct) Briey justify your answer.
The twosided test above is at the 5% signicance level and rejects the hypothesis that µ1 − µ2 = 0. 29. In a certain city, 25% of residents are European. Suppose 120 people are called for jury duty, and only
24 of them are European. Does this indicate that Europeans are under-represented in the jury selection
system? Carry out an appropriate hypothesis test at the 1% signicance level. Remember to dene the
parameter(s) that relates to your test. Let p be the true proportion called for jury service that are Europeans. We test
H0 : p = 0.25 against
HA : p < 0.25. With n = 120, and
np = 120 × 0.25 = 30 > 10
n(1 − p) = 120 × (1 − 0.25) = 90 > 10, then approximately
p∼N
ˆ under H0 . The test statistic is
z= 0.25, 24
120 0.25 × 0.75
120 − 0.25 = −1. 265. 0.25×0.75
120 The p-value is P (Z < −1.26). By the 68-95-99.7 rule, 0.025 <P-value< 0.16, so P-value> α = 0.01.
Hence, we do not reject H0 and conclude there is no evidence to suggest the underrepresentation of
Europeans in the jury selection system. 9 30. A medical research is interested in examining the relationship between the duration of catheterization
and whether or not an infection occurred. The thought is that whether or not an infection occurs may
be related to the duration of catheterization. She collects data on 266 patients and the data is presented
in the contingency table below. Infection
No Infection
Total Duration(days)
1 2 3 ≥4
5 10 8 18
46 64 39 76
51 74 47 94 Total
41
225
266 (a) For this set of data, what is the probability of a person getting an infection given that their duration
was between 1 and 2 days?
Since 51 + 74 = 125 patients had duration between 1 and 2 days, and among them 5 + 10 = 15
patients got infections,
15
P(getting an infection given that duration is between 1 and 2 days)= 125 = 0.12 (b) What is the distribution of the duration, conditioned on them having an infection?
The distribution of duration conditioned on having an infection is:
Duration (days)
# Infection
Proportion 5
41 1
5
= 0.122 10
41 10 2
10
= 0.244 8
41 3
8
= 0.195 18
41 ≥4
18
= 0.439 Total
41
1 (c) What is the marginal distribution of the duration of catheterization?
The marginal distribution of duration of catheterization is:
Duration (days)
Total #
Proportion (d) 51
266 1
51
= 0.192 74
266 2
74
= 0.278 47
266 3
47
= 0.177 94
266 ≥4
94
= 0.353 Total
266
1 Complete a hypothesis test to decide if the proportion of infection is higher for patients whose
duration ≥ 4 days compared to those whose duration < 4 days. Use α = 0.01.
Infection
No Infection
Total Duration(days)< 4
23
149
172 Duration(days)≥ 4
18
76
94 Total
41
225
266 Let p1 be the true proportion of infection for patients whose duration ≥ 4 days, and p2 be the true
proportion of infection for patients whose duration < 4 days. Here, we test
H0 : p1 = p2 HA : p1 > p2 . against
The test statistic is
z0 = where,
18
p1 = 94 ,
ˆ
ppooled =
ˆ
Hence, 23
n1 = 94, n2
172 ,
n1 p1 +n2 p2
ˆ
ˆ
18+23
41
= 94+172 = 266 ....

View
Full Document