{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

S21FinalSG - Hank Ibser Statistics 21 Fall 2010 The final...

Info iconThis preview shows pages 1–17. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 12
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 14
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 16
Background image of page 17
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Hank Ibser Statistics 21 Fall 2010 The final will be Monday, Dec 13, 11:30-2z30pm, location TBA. Please get there a little early as I may move people around a bit. The final will cover the following chapters as specified below: Ch 1: A11. Ch 2: Omit section 4; in Ch 1—2 you need to know the ideas and jargon but not the specific examples. Ch 3: Omit sections 5—7. Ch 4: Omit stuff on longitudinal/cross~sectiona1 data in section 2. Omit section 7 unless you are using a calculator to find SDs. Ch 5: Omit interquartile range (p89). Ch 6: Omit. Special review exercises 1,2,3,4,6,7,9,11 are good practice. Ch 7: Omit . Ch 8: Omit Review Ex #12. Ch 9: Omit section 4, technical note on p146—147. Ch 10: Omit technical note on p169. Ch 11: Omit section 3 and technical note on p197. Ch 12: Omit, in section 2, everything from "Now, an example." to the end of the section (p208—211). Omit section 3. Omit Review Ex #12. Ch 13—20: All. At the end of Ch 15, good practice problems are special review exercises 8-9, 11-15, 17—20. Ch 21: Omit section 5. Ch 22: Omit, although you can do Rev Ex 5—12 for practice. Ch 23: A11. Ch 24,25: Omit. Ch 26: Omit Rev Ex 10—12. Ch 27—29: All. At the end of Ch 29, special review exercises 1, 4—28, 33, 35, and 36 are good practice problems. The exam will also cover the three handouts: 1) Summation, Average, and SD, 2) Summation and Correlation, and 3) Probability. I'll have 10 questions on the final, roughly 2—3 from the material on the midterm, and 7—8 on the material since the midterm. You need to bring a calculator and something to write with. No blue book necessary. You'll get a normal table, t table, and Chi—square table on the exam. l. A box contains the following numbered tickets: l,l,5,9,9 a) If I draw two tickets with replacement, what is the chance that the sum of the two tickets is greater than or equal to 10? b) Drawing three tickets without replacement, what is the chance the first two tickets are not 5's, and the last ticket is a 5? c) Calculate b) if the draws are made with replacement. d) If I repeat the procedure in a) 8 times (ie draw 2 tickets and find their sum, and do this 8 times), what is the chance that I get a sum greater than or equal to 10 exactly 6 of the 8 times? 2. Weights and heights of turkeys tend to be correlated. For a population of turkeys at a farm, this correlation is found to be 0.64. The average weight is 17 pounds, SD is 5 pounds. The average height is 28 inches and the SD is 8 inches. Weight and height both roughly follow the normal curve. For each part below, answer the question or if not possible, indicate why not. a) A turkey at the farm which weighs more than 90% of all the turkeys is predicted to be taller than % of them. b) The average height for turkeys at the 90th percentile for weight is . ' c) Of the turkeys at the 90th percentile for weight, roughly what percent would you estimate to be taller than 28 inches? 3. A box has 3 red balls and 5 blue balls. a) I draw 3 balls from this box with replacement. Find the expected value and SE for the number of red balls. b) Repeat a) if the draws are made without replacement. 4. Choose one of the following values of r for each example below. (—1, —.7, O, .7, 1). You may use some values more than once and you may not use them all. For full credit choose a value and explain your choice briefly. a) A group of employees all get a 10% cut in their salaries. What is the correlation between their salaries before the cut and after the cut? b) The correlation between weight and belt size for a group of men all the same height. Everyone in a large lecture class flips a coin 100 times and records the results. Then everyone in the class does this again. For parts c)—e) keep these two sets of coin tosses in mind — each student provides one data point. c) The correlation between the number of heads in the first set of 100 tosses and the number of heads in the second set of 100 tosses. d) The correlation between the number of heads in the first set of 100 tosses and the number of tails in the first set of 100 tosses. e) The correlation between the number of heads in the first set of 100 tosses and the number of tails in all 200 tosses. 5. I have 5 coins in my pocket: 2 quarters, 2 dimes, and a nickel. a) I draw three coins without replacement. What is the chance that the third coin is the nickel? b) I draw three coins without replacement. Find the expected value and the SE for the total value of these three coins. 6. I roll a six-sided die 4 times. Find chances for each of the following events: a) the first die roll is bigger than the second or the first roll is a 3. b) all the numbers are different. c) not all the numbers rolled are even. 7. A group of married couples takes an IQ test. The average husband's IQ is 105 with an SD of 15 and the average wife's IQ is 110 with an SD of 10. The correlation between husband's and wife's IQ is 0.5. a) A man has an IQ of 75, what would you predict his wife's IQ is? b) Of all men with an IQ of 75, about what percent are smarter than their wives? 8. The national percent of adults who have a laptop computer is 62%. In San Francisco, a simple random sample of 100 adults is taken and 69% of these adults have a laptop. Is this strong evidence that more adults in SF have laptops than the national average? a) State the null hypothesis and the alternative in terms of a box model. b) Find 2 and p values. c) What do you conclude? 9. I take a simple random sample of 400 UC Berkeley students and another simple random sample of 100 Stanford students. The average weight of the Berkeley students is 144 pounds with an SD of 26 pounds. The lightest person in the sample weighs 98 pounds. In the Stanford sample, the average weight is 147 pounds with an SD of 28 pounds. An exercise physiologist claims that this is evidence that Berkeley students weigh less than Stanford students. In order to support this claim, she does a hypothesis test. For parts a)—c), do the test even if you think it's not appropriate. a) State the implicit null hypothesis and the alternative hypothesis. b) Find 2 and p in the usual way. c) Based on the z and p values, what do you conclude? d) True or false, and explain briefly: The hypothesis test done here doesn't make sense because the Berkeley data don't follow the normal curve. 10. Before a substance can be deemed safe for landfilling, its chemical properties must be characterized. Sixty samples of sludge from a wastewater treatment plant have an average pH of 6.6 and an SD of 3.5. You may assume that these samples are like a simple random sample of all the sludge that comes from the plant. If possible, construct a 95% confidence interval for the average pH of all the sludge that comes from this plant. If this is not possible, explain why not. 11. The national average for cumulative SAT score was 1026 in 2003 and the SD was 105. SAT scores generally follow the normal curve pretty closely. I take a SRS of 5 students from our Stat 21 class and find that the average and SD in my sample are 1240 and 90. Do you believe that the class average is more than the 2003 national average? If appropriate, do a hypothesis test to answer this question, including null and alternative hypotheses, a box model, test statistic, p—value, and interpret the p-value. If it is not appropriate to do a hypothesis test, explain why not. 12. Fans vote online for their favorite character — Bubbles, Blossom, Buttercup, or Mojo Jojo. Out of the first 100 votes, 23 vote for Bubbles, 29 for Blossom, 37 for Buttercup, and 11 for Mojo Jojo. Mojo Jojo says, "These results could have happened just by chance, people are just pressing buttons at random and in fact all four of us have an equal number of fans." Blossom says he's wrong, this couldn't have happened by chance. Construct the hypothesis test implicit in Mojo Jojo's statement. Write out the null and alternative hypotheses, find the appropriate test statistic, an approximate P—value, and write out your conclusion. ‘6! Statistics 21 ' Problems from "past" this] exains . l. (10 points) A psychologist administrated a multiple-choice test to 122 college students. Each question had five possible answers and-only one, was correct. The questions were taken from the reading comprehension section of the SAT verbal test, with one important change. The SAT presents short passages to read and follows these with multiple-choice questions ‘about the meaning of the passages. For his test, the psychologist took only the questions—not the reading passages. So someone taking the psychologist’s test has to answer the SAT questions without having read the pass ages. Why make up such a crazy test? The psychologist was trying to find out whether this part of the SAT was really . testing reading ability or some other test-taking skill. 'None of the 122 students had taken the SAT. Their average score on the psychologist’s test turned out to- be 38 points out of .100. (There were 100 questions; a correct answer was worth one point and an incorrect answer zero points.) The psychologist took the average of 38 as evidence that points can be earned on this Part of the SAT by using mental skills other than the ability to understand the passages. Of course, the psychologist realized that the students would get some answers right just by luck. ' (a) Formulate the null hypothesis implicit in the above paragraph as a statement about a box model. (b) Calculate the appropriate tGSt statistic. (6) Would you reject the null hypdthesis? 2. (10 points) For the data set shown below: L__¥ 0 l 1 ‘ 2 l O 2 2 i (a) Plot the scatter diagram. (b) Find the correlation between x and y. (c) Write down the equation of the regression line for predicting y from x. ' (d) Plot the regression line on the scatter diagram. (e) Complete the following table (there are 8 missing numbers): (f) Find the r.m.s. error of the regression line for estimating y from x. ' (g) Is there some other line which will lead to a smaller r.m.s. error than the one in (0)? Answer yes or no and explain briefly. '1 3. 4. (5 points) A person will be picked at random. from the class in problem 1i and you will be told-:‘his or her midterm score. Using this information, you have to guess his or her final score. Someone says: “Look, the final “average was exactly tuli'ce the midterm average. So, just double the midterm score you are told and use that as your guess? in .the quote. Give your answer to at least (a) Find the r.m.s. error for the methbd suggested decimal lace. . (b) $131613 anoger method with a smaller r.m.s. error? Answer yes or no, and explain. briefly. (10 points) A test of reaction time is given twice to a group of 1,200 men: the first time at age twenty, the second time at age forty. The summary statistics for the results are as follows: SD = 105 Average = 780 SD = 100 Average = 810 Correlation = 0.80 Age 20: Age 40: The units for the reaction times are milliseconds. Of those whose reaction time was average at age twenty, what percent had times at forty which were in the bottom quarter for the 1,200 men at that age? (You may assume the scatter diagram is football shaped.) (10 points) A computer program simulates drawing fifty times at random, with replacement, ~ ‘ from the box: II!!! The program prints out the results from left to right, in two rows of twenty-five digits each: lst row: x x x x 2nd-row: x ~ x x x Find the expected value and the SE for the number of times a digit in the second row is the same as the one directly above it. (10 points) A transportation study involves two suburbs of a large city. One suburb is a_ little farther from the city center than the other. The investigators believe that living in the more distant suburb adds, on average, not more than 2 minutes ‘to the commute time. The ' investigators take a simple random sample of 900 from the residents of the more distant suburb, and a simple random sample of 400 from the residents of the more central suburb. In the sample of 900 residents, the'ave'rage commute time on a particular day was 17.4 minutes; the SD was 18.0 minutes. In the sample of 400 residents, the average commute time on that same day was 14.5; the SD was 15.0 minutes. Does. these results provide evidence against the investigators’ belief? (a) Calculate the appropriate test statistics and find the observed significance level. 0)) Answer the question. if: (10 points) In a study of 600 identical- vocabulary test. The summary statistics twins, the twins were given, among other things, a for the scores are as follows: younger twin: average = 25 pts 'SD of score = 10 pts older twin: average = 25 pts SD of score = 10 pts Correlation = 0.80 (a) Someone proposes to use the regression method for estimating the score of the older twin from the score of the younger twin. Find the r.m.s. error for this method. (b) Someone else, who has not heard of the regression method, proposes to take the score of the younger twin, and use it, unchanged as the estimate for the score of the‘ older twin. Find the r.m.s. error for this method. 8. (J 0 points) Two draws are made at random with replacement from the box: ' (a) What is the chance i. U. [Ell different letters are drawn? (b) What is the chance of getting a vowel at least once in the two draws? nts) In the preceding five years, entering students at a certain university had an ‘ f 612 points. A simple random sample of one hundred students is verbal score for these students is 594 t’s verbal ability? (10 poi average SAT verbal score 0 taken from this year’s entering class. The average SAT points with an SD of 8-0 points. Does this show a decline in entering studen (a) Formulate thenull hypotheses implicit in this question as a statement about a box model. (b) Calculate the appropriate test statistics and find the observed significance level. (c) Answer the question. (10 points) Researchers at the University of Rochester School of Medicine and Dentistry wanted to see if the drug clonidine was effective in helping people quit smoking. They took a group of 185 smokers and chose 92 at random to receive the drug. The 93 remaining were given a placebo. After four weeks of this treatment the researchers found that 17 of the 92, and 13 of the 93 were no longer smoking. What do you conclude? (Calculate the appropriate test statistics and find the observed significance level.) (10 points) On a vocabulary test given to a sample of undergraduates at a large university, 15% of the men and 18% of the women knew the meaning of the word “arcane.” The group tested consisted of a simple random sample of 400 from the male undergraduates, and independently, a simple random sample of 225 from the female undergraduates. Does the difference of 3 percentage points mean that a larger percentage of women undergraduates know the word than the men, or is it a chance variation? (a) Formulate the null hypotheses implicit in this question as a statement about a box model. (b) Calculate the appropriate test statistics and find the observed significance level. (c) Answerthe question. ll- (10 points) A couple enters a gambling house. The man goes to a roulette wheel and bets $1 on red 15 times in a row. The woman goes to a different roulette wheel and bets $1 on a column 10 times in a row. A column bet pays 2 to 1 and there are 12 chances in 38 to win. A bet on red pays even money and there are 18 chances in 38 to win. 01' SO. 01' SO. give or take $ (a) The woman’s net gain will be around $ give or take $ (b) The couple’s net gain will be around $ (10 points) The summary statistics for height in the U.S. are as follows: SD = 3.0 inches Men: Average Height: 69.0 inches SD = 2.5 inches Women: Average Height = 63.5 inches Someone clainm to be taller than 90% of women in the U.S. but shorter than 90% of the men. Is this claim reasonable? . Choose one of the Options below. Mark your choice with a check mark (‘1). Yes, the claim is reasonable . N o, the claim is not reasonable . Iii-your answer is yes, find the height of- the person making the claim. If your answer is no, explain why the. claim is not reasonable. - ' -. (15 1155mm) Ten draws are made at random with replacement from the box: (a) Find, approximately, the chanCe- the sum of the ten numbers drawn is less than 18. (b) Find, approximately, the chance the sum of the squares of the ten numbers is less than 1 8. - ' ' (c) In part (b), find the exact chance the sum of the squares of the ban numbers is less than 1 8 . (10 points) The summary statistics below are taken from a representative sample of 637 California men, age 25-29 in 1988. SD = 4 years average education = 12.5 years . SD = $16,000 average income =~$19,700 correlation = 0.35 In parts (a) to (d) below, let it = 637, x be years of education, and y be income. it. [’7‘. (is. (a) The average value of 2:2 is: (b) Find the equation of the regression line. (C) 2 [ye - (1400 x cc,- + 2200)]2 637 (10 points) One hundred draws will be made at random from the bdx: The chance the sum of the draws will fall in the range from: -____'_to____+ (a) (b) (a) (b) is approximame 90%. Find two numbers to fill in the blanks so as to make the above statement true. Put one of the numbers in the two (a) slots and the other in the two (b) slots. (10 oints) A city has 50,000 small businesses. The planning department takes a simple andbm sample of 625 such business, and sends out a team of interviewers to these businesses. The interviewers proceed to administer a questionnaire to all the employees of each business in the sample. It turns out that the average number of employees in the sampled businesses is 8.3 and the SD is 2.7. In 502 of the businesses in the sample, all the employees had 12 years or more of education. If possible, find a 95% confidence interval for the percentage of small businesses in the city where all the employees have 12 years or more of education. If it is not possible, explain why not. (10 points) A survey organization takes a simple random sample of 400 households from a city of 80,000 households. On the average, there are 2.60 persons per sample household, and the SD is 1.85. Out of the 400 households in the sample, 98 were one person households—that is, consisted of a single person living alone. (a) Estimate the percentage of households in the city that) are one person households (b) Attach a standard error to the estimate in (a). ‘- (c) Estimate the total population of the city. ' ii. 74. ((1) Attach a'standard error to the estimate in (0). Explain carefully all the steps in your calculation. . (10 points) A health maintenance organization is interested in the number of households in a city which are not covered by health insurance. The organization takes a simple random sample of 900 households from the city. The sample is divided into two parts according to the age of the head of the household. The first part consisted of all households in the sample where the head of the household is age 30 or under. There were 520 such households. The second part consists of the households where the head is over 30. There were 380 of these households. The table gives some of the findings from the sample. Age of Head of Households % Uninsured 32.5% 30 years or under 20.0% Over 30 years The first line in the table, for example, says that 32.5% of the-households where the head is under 30 are not covered by health insurance. If possible, find a 95%—confidence interval for the percent of households in the city not covered by health insurance. If this is not possible, explain why not. (10 points) As part of his first class in exercise physiology, a physical education instructor, with the help of his teaching assistants, measures the height and weight of everyone in the class. There are 200 students in the class. Their average weight turns out to be 161 pounds and‘the SD is 28.3 pounds. (The class is all male.) His next step is to take a simple random sample of 50 students from the class and i...
View Full Document

{[ snackBarMessage ]}