200FinalExamA_Soln - For questions 1 30 circle one answer only If making an error write your nal answer clearly in the margin Ambiguous responses will

200FinalExamA_Soln - For questions 1 30 circle one answer...

This preview shows page 1 out of 11 pages.

Unformatted text preview: For questions 1  30, circle one answer only. If making an error, write your nal answer clearly in the margin. Ambiguous responses will be considered incorrect. 1. The mean age of ve people in a room is 30 years. One of the people whose age is 50, leaves the room. The mean age of the remaining people is 25 years. T(correct) F 2. For a Normal distribution, the rst and third quartiles are respectively one standard deviation below and above the mean. T F(correct) 3. A correlation coecient based on a scatter plot measures the proportion of data lying on the regression line. T F(correct) 4. A phonein poll at a radio station concluded that 70% of Canadians approve of Stephen Harper, based on the responses of 8,000 callers. The conclusion is valid since the sample size is large. T F(correct) 5. A random sample of 100 students are asked if they are vegetarians. Five percent respond yes, and a 99% condence interval for the true proportion of students who are vegetarians is found to be (0.03, 0.07). This implies 99% of all random samples of 100 students will have a sample proportion that falls between 0.03 and 0.07. T F(correct) 6. Condence intervals are constructed for both parameters and statistics. T F(correct) 7. You carry out a hypothesis test that compares the means of two populations. Suppose a p-value of 0.12 is obtained. This implies that there is a 12% chance that the two population means are equal. T F(correct) 8. If the observations in a data set are all equal, then (a) the variance of the data set equals 0. (b) the mean of the data set equals 0. (c) the IQR of the data set equals 0. (d) both (a) and (c). (e) both (b) and (c). 1 9. At the start of this course, I hypothesized that the students in the pharmacy program might have a higher mean grade than the students not in the pharmacy program. After this nal exam is completed and marked, I will have nal grades for every student. To test my hypothesis, I should use. . . (choose the best answer) (a) A one-sample t-test for the mean (b) A two sample t-test for the mean (c) A condence interval for the proportion (d) The correlation coecient (e) Linear regression UBC is interested in nding out if students wish to continue the U-pass program. The researchers decide that undergraduate and graduate students may have dierent opinions on such an issue. Undergraduate students comprise 84% of the student population and graduate students make up the other 16%. The researcher will survey 1000 students, and so she obtains a list of all UBC students. She divides them into two lists, one of undergraduate students and one of graduate students. She then randomly samples 840 undergraduate and 160 graduate students to survey. She nds that 83% of those surveyed wish to continue with the U-pass program. Use this information to answer questions 10 and 11. 10. What type of sampling technique was used? (a) Simple random sampling (b) Stratied sampling (c) Cluster sampling (d) Systematic sampling (e) Convenience sampling 11. What is the population parameter? (a) The distribution of students in an undergraduate of graduate program (b) 83% (c) The 1000 students surveyed (d) The proportion of UBC students who wish to continue the U-pass program (e) UBC students 12. Researchers at a university are interested in examining the relationship between one's smoking habits and whether or not they have lung cancer. They randomly sample 312 people with lung cancer and 427 people without lung cancer. After interviewing the patients and analyzing the data, the conclude that smoking is associated with having lung cancer. Which of the following is correct? (a) This is a prospective study (b) This is a retrospective study (c) The sample sizes must be equal to have a valid study (d) There is no control group (e) Both (b) and (c) 2 13. A simple random sample of 150 cars was taken to estimate the mean speed of cars driving on the sea-to-sky highway, and resulted in the following 95% condence interval: (107,116). Circle the only correct statement. (a) About 95% of cars in this sample were driving between 107 to 116km/hr (b) A car found driving 120km/hr would be considered unusual (c) We have violated our model assumptions by taking a simple random sample (d) There is a 5% chance of making a Type II error (e) None of the above 14. The national farming association has data on the amount of pesticide used per acre and the percentage of fruit that has been contaminated by insects per acre, for 117 farms. There two variables have a correlation coecient r = −0.8. For a farm that uses an amount of pesticide that is 2 standard deviations above the mean amount of pesticide used, we would predict the percentage of contaminated fruit will be (a) 1.6 standard deviations above the mean (b) 1.6 standard deviations below the mean (c) 2 standard deviations above the mean (d) 2 standard deviations below the mean (e) 1.28 standard deviations below the mean 15. The Vancouver police are interested in estimated in estimating the proportion of cars that are uninsured within the city. They assume that uninsured cars are spread randomly and uniformly throughout the city. They create a list of all major intersections in the city, and randomly select 10 inspect. They set up roadblocks at each of the selected intersections and stop all cars passing to check for insurance. What type of sampling technique was used? (a) Simple random sampling (b) Stratied sample (c) Cluster sampling (d) Convenience sampling (e) Systematic sampling 16. The slope of a regression line and the correlation are similar in the sense that (a) they both have the same sign. (b) they do not depend on the units of measurement of the data. (c) they both fall between 1 and 1 inclusive. (d) neither of them can be aected by outliers. (e) both can be used for prediction. 3 17. Which of the following is an incorrect statement about the correlation between two quantitative variables X and Y ? (a) A correlation of 0.8 indicates a stronger linear association between X and Y than a correlation of 0.5. (b) A correlation of 0 implies X and Y are not related at all. (c) A correlation of 1 indicates that Y = −X . (d) Both (b) and (c). (e) Both (a) and (b). 18. A certied tness coach wanted to test the eectiveness of a new tness program in reducing weight among obese patients. Fifty female patients and fty male patients participated in the experiment. Within each gender group, the patients were randomly assigned to one of the two tness programs  the new and the existing tness programs. Upon completion of the program, reduction in weight was measured for each patient. Which of the following statements is incorrect about this experiment? (a) There are four treatments in the study. (b) Gender is a blocking variable. (c) The patients were not guaranteed to lose weight due to the experiment. (d) Reduction in weight is the response variable. (e) Type of tness program is the factor. 19. In testing a twosided hypothesis test for a mean, the test statistic was 2.12 which is expected to be a value from the standard Normal distribution under the null hypothesis. The Pvalue of the hypothesis test is (choose the most appropriate answer) (a) Between 0 and 0.025 (b) Between 0 and 0.05 (c) Between 0 and 0.003 (d) Between 0.95 and 0.997 (e) Between 0.05 and 1 20. If examining the plot of the residuals against the explanatory variable x for a linear model, when the model ts well one would expect (a) the residuals to lie on a line of positive slope. (b) the residuals to lie on a line of negative slope. (c) the residuals to scatter about a line of positive slope. (d) there to be no variation in the residuals. (e) there to be no obvious pattern in the residuals. 4 21. In a large city, 37% of all restaurants accept both master and visa credit cards, and 50% accept master cards and 60% accept visa cards. A tourist visiting the city picks at random a restaurant at which to have lunch. Dene the following events: M V = {the randomly chosen restaurant accepts master credit cards} , = {the randomly chosen restaurant accepts visa credit cards} . Are M and V independent?. (a) Yes. (b) No. (c) Insucient information to tell. 22. A type of thread is being studied for its tensile strength. Fifty-one pieces were tested under similar conditions, the mean tensile strength being 78.30kg and the standard deviation being 5.60kg. (1) Give an approximate 95% condence interval for the mean tensile strength of the thread. (2) Assuming the strength of the thread follows the normal model with mean and SD having the values given in the above, estimate the tensile strength that would be exceeded by 97.5% of such threads. (a) 78.30 ± 2 × 5.60 5.60 (b) 78.30 ± 2 × √51 (c) 78.30 ± 2.009 × 5.60 5.60 (d) 78.30 ± 2.009 × √51 (correct) (a) 67.10 kg (b) 89.50 kg (c) 76.73 kg (d) 79.87 kg (e) 61.50 kg 23. Eight marksmen, labeled A, B,. . . ,H, shot at targets with two types of rie. Their scores were as in the table below: Rie Type 1 Rie Type 2 Dierence(Type1-Type2) A 93 89 4 B 99 93 6 C 90 86 4 Marksman D E F 87 85 94 92 78 90 -5 7 4 G 88 91 -3 H 91 87 4 sample mean 90.875 88.25 2.625 sample SD 4.45 4.77 4.27 (1) To perform the hypothesis testing, is there any assumption needed? (2) When testing the hypothesis that the ries are of equal quality, what is the test statistic? (a) No assumption is needed. (b) Dierences are assumed to follow the t distribution. (c) Dierences are assumed to follow the Normal distribution. (d) Dierences are assumed to follow the Binomial distribution. (a) 1.1381 (b) 1.7388 (c) 2.4452 (d) 2.4590 5 (3) What is the P-value for this hypothesis test? (4) Is there a signicant dierence between the two types of ries at the 10% signicant level? (a) Greater than 0.2 (b) Between 0.1 and 0.2 (c) Between 0.05 and 0.1 (d) Between 0.02 and 0.05 (a) Yes. (b) No. (c) There is insucient information to tell. 24. How well do the size and age of a house determine the annual tax house owners are paying? Nineteen houses are randomly selected from a city. The age (# years since the house was built), house size (measured in square feet of living space) and the amount of annual tax (in dollars) are recorded for each of the 19 houses. Here are the summary statistics for two of the three variables: house size : mean = 1456 sqft, standard deviation = 374 sqft annual tax : mean = $1707, standard deviation = $323 (1) (2) The linear regression line that predicts the amount of annual tax from the house size has a slope of $0.81 per square foot. Find the value of the correlation between the size of a house and the amount of annual tax charged. (a) 0.94 (b) 0.69 (c) 0.70 (d) 0.95 (e) 0.91 Another linear regression line is tted to predict the amount of annual tax from the age of a house. This regression line has a slope of -$92.4 per year. Based on the information that is available to you in this question, which of the following is a correct statement? (a) (b) (c) (3) The correlation between house size and amount of annual tax is stronger than that between age of a house and amount of annual tax. The correlation between house size and amount of annual tax is weaker than that between age of a house and amount of annual tax. The correlation between house size and amount of annual tax is the same as that between age of a house and amount of annual tax. (d) There is insucient information to tell. Predict the annual tax paid by owners owning a 1500-squre feet house. (a) $1215. (b) $2915. (c) $1466. (d) $1743. 6 25. A study investigated whether month of birth impacts on the time a baby learns to crawl. Parents with children born in January, May or October were asked the age, in weeks, at which their child could crawl one metre within a minute. The data are summarized below: Birth month Crawling Mean January 29.84 May 28.58 October 33.83 age SD 7.08 8.06 6.93 size 34 29 40 Which of the following statements you consider to be correct? CHECK ALL THAT APPLY. The table shown above is a contingency table. It would be inappropriate to calculate correlations for these data. (correct) This is a randomized block design experiment. None of the above. 26. A multiple choice exam consists of 10 questions, each question having 4 possible answers to choose from. Suppose a student has not studied for the exam, and will make completely random guesses at the answer for each of the questions. (a) What is the probability that this student gets at least 3 answers in the exam correct? 1 Let X = # of correct answers, n = 10, p = 1 , X ∼ Bin(10, 4 ) 4 P (X ≥ 3) 1 − P (X = 0) − P (X = 1) − P (X = 2) 1 3 1 3 1 3 = 1 −10 C0 ( )0 ( )10 −10 C1 ( )1 ( )9 −10 C2 ( )2 ( )8 4 4 4 4 4 4 = 1 − 0.0563 − 0.1877 − 0.2816 = = (b) 0.4744 Now consider an exam of 100 multiple choice questions with each question having 4 possible answers to choose from. If the student will make completely random guesses at all of the answers, is it usual that he gets at least 30 answers correct on the exam? Justify your answer probabilistically. p = proportion of correct answers among the 100 questions, n = 100, and p = 1 Since np = ˆ 4 1 100 × 4 = 25 > 10, n(1 − p) = 100 × (1 − 1 ) = 75 > 10, we can use normal approximation to 4 ˆ sample proportion.p = 1 , σ(p) = p(1−p) = 1/4(1−1/4) = 0.0433, and p ∼approx N ( 1 , 0.0433) ˆ 4 n 100 4 0.3−p 30 0.3−0.25 P (ˆ ≥ 100 ) = P (Z ≥ σ(p) ) = P (Z ≥ 0.0433 ) = P (Z > 1.15): between 0.025 and 0.16 by p ˆ 68-95-99.7 rule. The probability isn't too low, so I think it is not too unusual. 27. ipods have been criticized for having a battery that doesn't last very long. I am interested in studying a few things about the mean lifetime of a fully charged battery. I randomly sample and test 32 ipods of the same model, and nd a sample mean lifetime of 6.15 hours with a sample standard deviation of 45 minutes. 7 (a) Suppose we want to re-estimate the true mean lifetime of the battery using a 95% condence interval with a margin of error no larger than 10 minutes. How large a sample should we take? Note that the t-score depends on the sample size, but the sample size is what we need to solve for here. We will assume that the sample size is large such that the z-score and the t-score have similar values. We will hence use the z-score which is constant (a value of 2 for 95% condence) for our calculation. Margin of error M E = 10 min, s = 45 min, z ∗ = 2 n= z∗s ME 2 = 2 × 45 10 = 81 ipods So the sample size should be n = 81 ipods. (b) The company claims that the true mean lifetime of a fully charged battery is signicantly greater than 6 hours. Test this claim using a signicance level of 5%. y −µ0 ¯ 6.15−6 item H0 : µ = 6 hours, vs. HA : µ > 6 hours Test statistic is t0 = s/√n = 0.75/√32 = 1.131 df = n − 1 = 32 − 1 = 31, P-value = P (t31 ≥ 1.131) ≈ P (t30 ≥ 1.131), and 0.10 < P-value. Since P-value > α = 0.05, we fail to reject H0 and conclude that there is not enough evidence to say the true mean lifetime of a fully charged battery is greater than 6 hours at a signicance level of 5%. 28. A study was conducted to determine whether an expectant mother's cigarette smoking has any eect on the bone mineral content of her otherwise healthy child. A sample of 30 newborns whose mothers smoked during pregnancy has a mean bone mineral content of 0.092 g/cm and a standard deviation of 0.026 g/cm; a sample of 72 infants whose mothers did not smoke has a mean of 0.105 g/cm and a standard deviation of 0.025 g/cm. (a) Do the data suggest that the population mean bone mineral content of newborns dier between mothers who smoked and those who did not smoke during pregnancy? Use a signicance level of α = 0.05. Dene clearly the parameter(s) and variable(s) that relate to your test. Let y1 be the bone mineral content of a newborn whose mother smoked, y2 the corresponding variable for a baby whose mother did not smoke. Let µ1 and µ1 be the respective means, σ1 and σ2 their respective standard deviations. σ1 and σ2 are unknown. We test H0 : µ1 = µ2 against HA : µ1 = µ2 . the test statistic is t= y1 − y2 ¯ ¯ = SE(y1 − y2 ) ¯ ¯ y1 − y2 ¯ ¯ s2 1 n1 + s2 2 n2 = 0.092 − 0.105 0.0262 30 + = −2.33. 0.0252 72 Under H0 this should be from the t29 distribution, since min (30 − 1, 72 − 1) = 29. P-value = 2 × P (t29 > | − 2.33|), and 0.01 < P (t29 > | − 2.33|) < 0.025, so 0.02 < P-value < 0.05. Since P-value < α = 0.05, we reject H0 and conclude there is a dierence between the underlying means. 8 (b) Based on the results obtained from part (a), you can condently say that (circle all that apply): (c) Would you expect a 95% condence interval for the true dierence in the population means to contain the value 0? (Circle one) (i) smoking causes a decrease in the bone mineral content in the newborns. (ii) smoking is associated with the bone mineral content in the newborns. (iii) smoking has no eect on the bone mineral content in the newborns. (iv) smoking is independent of the bone mineral content in the newborns. Yes No(correct) Briey justify your answer. The twosided test above is at the 5% signicance level and rejects the hypothesis that µ1 − µ2 = 0. 29. In a certain city, 25% of residents are European. Suppose 120 people are called for jury duty, and only 24 of them are European. Does this indicate that Europeans are under-represented in the jury selection system? Carry out an appropriate hypothesis test at the 1% signicance level. Remember to dene the parameter(s) that relates to your test. Let p be the true proportion called for jury service that are Europeans. We test H0 : p = 0.25 against HA : p < 0.25. With n = 120, and np = 120 × 0.25 = 30 > 10 n(1 − p) = 120 × (1 − 0.25) = 90 > 10, then approximately p∼N ˆ under H0 . The test statistic is z= 0.25, 24 120 0.25 × 0.75 120 − 0.25 = −1. 265. 0.25×0.75 120 The p-value is P (Z < −1.26). By the 68-95-99.7 rule, 0.025 <P-value< 0.16, so P-value> α = 0.01. Hence, we do not reject H0 and conclude there is no evidence to suggest the underrepresentation of Europeans in the jury selection system. 9 30. A medical research is interested in examining the relationship between the duration of catheterization and whether or not an infection occurred. The thought is that whether or not an infection occurs may be related to the duration of catheterization. She collects data on 266 patients and the data is presented in the contingency table below. Infection No Infection Total Duration(days) 1 2 3 ≥4 5 10 8 18 46 64 39 76 51 74 47 94 Total 41 225 266 (a) For this set of data, what is the probability of a person getting an infection given that their duration was between 1 and 2 days? Since 51 + 74 = 125 patients had duration between 1 and 2 days, and among them 5 + 10 = 15 patients got infections, 15 P(getting an infection given that duration is between 1 and 2 days)= 125 = 0.12 (b) What is the distribution of the duration, conditioned on them having an infection? The distribution of duration conditioned on having an infection is: Duration (days) # Infection Proportion 5 41 1 5 = 0.122 10 41 10 2 10 = 0.244 8 41 3 8 = 0.195 18 41 ≥4 18 = 0.439 Total 41 1 (c) What is the marginal distribution of the duration of catheterization? The marginal distribution of duration of catheterization is: Duration (days) Total # Proportion (d) 51 266 1 51 = 0.192 74 266 2 74 = 0.278 47 266 3 47 = 0.177 94 266 ≥4 94 = 0.353 Total 266 1 Complete a hypothesis test to decide if the proportion of infection is higher for patients whose duration ≥ 4 days compared to those whose duration < 4 days. Use α = 0.01. Infection No Infection Total Duration(days)< 4 23 149 172 Duration(days)≥ 4 18 76 94 Total 41 225 266 Let p1 be the true proportion of infection for patients whose duration ≥ 4 days, and p2 be the true proportion of infection for patients whose duration < 4 days. Here, we test H0 : p1 = p2 HA : p1 > p2 . against The test statistic is z0 = where, 18 p1 = 94 , ˆ ppooled = ˆ Hence, 23 n1 = 94, n2 172 , n1 p1 +n2 p2 ˆ ˆ 18+23 41 = 94+172 = 266 ....
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture