This preview shows pages 1–18. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Hank Ibser Statistics 131A Fall 2009 The final exam will be Tuesdasy Dec 15th, 5—8pm, in Bechtel Auditorium. Please get there a little early as I move people around a bit. The final will cover the following chapters with certain parts omitted: Ch 1: All, but see below for Ch 2... Ch 2: Omit section 4; in these two chapters you need to know the ideas
and jargon but not the specific examples, mostly covered in the
first section of each chapter. Ch 3: Omit sections 5—7. Ch 4: Omit stuff on longitudinal/cross—sectional data in section 2. Omit section 7 unless you are using a calculator to find SDs. Ch 5: Omit interquartile range (p89). Ch 6: Omit. Special review exercises 1,2,3,4,6,7,9,11 are good practice.
Ch 7: Omit. Ch 8: Omit Review Ex #12. Ch 9: Omit section 4, technical note on p146—147. Ch 10: Omit technical note on p169. Ch 11: Omit section 3 and technical note on p197. Ch 12: Omit, in section 2, everything from "Now, an example." to the
end of the section (p208—211). Omit section 3. Omit Review Ex #12. (midterm 1) Everything in Ch 13—20 except anything having to do with Fig 2, pg 313. Ch 15: Good practice problems are special review exercises 8H9,ll—15,17—2O. Ch 21: Omit section 5. (midterm 2) Ch 22: Omit, althOugh you can do Rev Ex 5—12 for practice. Ch 23: All. (including special review at end of Ch 23)
Ch 24: Omit. Ch 25: Omit. Ch 26: Omit Rev Ex 10—12. Ch 27: All. Ch 28: Omit sections 3—5. Ch 29: Omit. Special review exercises 1, 4, 6~28, 33, and 36
are good practice problems. The exam will also cover the three handouts: 1) Summation, Average, and SD,
2) Summation and Correlation, and 3) Probability. I'll have 8—9 questions on the final: roughly 2 from the material on the first
midterm, 2 from the material on the second midterm, and 4*5 on the material
since the second midterm. You need to bring a calculator and something to
write with. No blue book necessary. You'll get a normal table, t—table, and
chi—square table on the exam. Grades should be posted on bearfacts by Friday
night. You can email me or your GSI to get your score on the final exam. Purves' practice problems don't have anything using the binomial (Ch 15)
or the Chi—Square test (Ch 28). Extra problems for Ch 15 are recommended
above. In general I think it‘s good exam preparation make up problems and
trade, and Ch 28 problems would be particularly good for this. The book problems are at a good level of diffiCulty (pretty hard) in this chapter.
Good luck on the final! The next 10 problems represent about a final in length, though a
bit heavy on the middle part of the class. 1. A box contains the following numbered tickets: l,l,5,9,9 a) If I draw two tickets with replacement, what is the chance that
the sum of the two tickets is greater than or equal to 10? b) Drawing three tickets without replacement, what is the chance
the first two tickets are not 5's, and the last ticket is a 5? c) Calculate b) if the draws are made with replacement. d) If I repeat the procedure in a) 8 times (ie draw 2 tickets and
find their sum, and do this 8 times), what is the chance that I get
a sum greater than or equal to 10 exactly 6 of the 8 times? 2. Weights and heights of turkeys tend to be correlated. For a
population of turkeys at a farm, this correlation is found to be
0.64. The average weight is 17 pounds, SD is 5 pounds. The average
height is 28 inches and the SD is 8 inches. Weight and height both roughly follow the normal curve. For each part below, answer
the question or if not possible, indicate why not. a) A turkey at the farm which weighs more than 90% of all the turkeys
is predicted to be taller than _&u___% of them. b) The average height for turkeys at the 90th percentile for weight
is . c) Of the turkeys at the 90th percentile for weight, roughly what
percent would you estimate to be taller than 28 inches? 3. A box has 3 red balls and 5 blue balls. I draw 3 balls from this box without replacement.
a) Find the expected value and SE for the number of red balls. b) Find the chance of getting 1 or more red balls. 4. Choose one of the following values of r for each example below. (—1, —.7, 0, .7, 1). You may use some values more than once and you may
not use them all. For full credit choose a value and explain your choice
briefly. a) A group of employees all get a 10% cut in their salaries. What is the correlation between their salaries before the cut and after
the cut? b) The correlation between weight and belt size for a group of men all the same height.  Everyone in a large lecture class flips a coin 100 times and records
the results. Then everyone in the class does this again. For parts
c)—e) keep these two sets of coin tosses in mind — each student provides
one data point. c) The correlation between the number of heads in the first set of 100
tosses and the number of heads in the second set of 100 tosses. d) The correlation between the number of heads in the first set of 100
tosses and the number of tails in the first set of 100 tosses. e) The correlation between the number of heads in the first set of 100
tosses and the number of tails in all 200 tosses. 5. I have 5 coins in my pocket: 2 quarters, 2 dimes, and a nickel. I draw three coins without replacement. a) What is the chance that the third coin is the nickel? b) Find the expected value and the SE for the total value of these three coins 6. I roll a six—sided die 4 times. Find chances for each of the following
events: a) the first die roll is bigger than the second or the first roll is a 3.
b) all the numbers are different. c) not all the numbers rolled are even. 7. A group of married couples takes an IQ test. The average
husband's IQ is 105 with an SD of 15 and the average wife's IQ is 110 with an SD of 10. The correlation between husband's and wife's IQ is 0.5. a) A man has an IQ of 75, what would you predict his wife's IQ is?
b) Of all men with an IQ of 75, about what percent are smarter than
their wives? 8. The national percent of adults who have a laptop computer is 62%. In San
Francisco, a simple random sample of 100 adults is taken and 69% of these adults have a laptop. Is this strong evidence that more adults in SF have laptops than the national average? a) State the null hypothesis and the alternative in terms of a box model. b) Find 2 and p values. c) What do you conclude? 9. I take a simple random sample of 400 UC Berkeley students and another
simple random sample of 100 Stanford students. The average weight of the Berkeley students is 144 pounds with an SD of 26 pounds. The lightest
person in the sample weighs 98 pounds. In the Stanford sample, the
average weight is 147 pounds with an SD of 28 pounds. An exercise
physiologist claims that this is evidence that Berkeley students weigh
less than Stanford students. In order to support this claim, she does a hypothesis test. For parts a)—c), do the test even if you think it's
not appropriate. a) State the implicit null hypothesis and the alternative hypothesis. b) Find 2 and p in the usual way. c) Based on the z and p values, what do you conclude? d) True or false, and explain briefly: The hypothesis test done here doesn't make sense because the Berkeley data
don't follow the normal curve. 10. Fans vote online for their favorite character — Bubbles, Blossom,
Buttercup, or Mojo Jojo. Out of the first 100 votes, 23 vote for Bubbles,
29 for Blossom, 37 for Buttercup, and 11 for Mojo Jojo. Mojo Jojo says,
"These results could have happened just by chance, people are just pressing
buttons at random and in fact all four of us have an equal number of fans."
Blossom says he's wrong, this couldn't have happened by chance. Construct
the hypothesis test implicit in Mojo Jojo's statement. Write out the null
and alternative hypotheses, find the appropriate test statistic, an
approximate P—value, and write out your conclusion. Solutions for my Sample Final 1. a) 17/25 b) 4/5 * 3/4 * 1/3 = 1/5 c) 4/5 * 4/5 * 1/5 = 0.128
d) (8 choose 6) (17/25)“6 (8/25)“2 = .283 2. a) Weight is at 90th percentile, which is about 23.5 pounds. Height is predicted to be 1.3 * 0.64 = 0.83 SDs above average, which means
that the turkey is predicted to be taller than about 80% of all turkeys.
b) 34.656 inches ‘ c) new SD is 6.15, z=(28—34.656)/6.15=1.08, about 86% above this. 3. a) E(X)=9/8 SE(X)=sqrt(45/64 * 5/7) = 0.709 (using correction factor) b) 1—P<no red)=1—(5/8) (4/7) (3/6) 4. 1, .7, 0, —1, 0.7 5. a) 1/5 b) EV=45 cents, SE=10.25 cents, or EV=.45, SE=0.1025 in dollars, similar
rto number 3 but you don’t get many of these in the book so I left it in.
6. a) 19/36 b) 5/18 c) 0.9375 ' 7. a) 100, b) 0.185% 8. a) null: the percent of adults who have laptops in SF is 62%, and the observed difference in the sample can be explained by chance. alt: the difference can’t be explained by chance, the 7. who have laptops
in SP is more than 62%. b) z=1.44, p is 0.07355 or about 7.4% c) can’t reject null. It could be just chance, the data is consistent with
the hypothesis that SF % is the same as the national %. 9. a) null: the difference can be explained by chance variation, the Berkeley pop average is the same as the Stanford pop average alt: the difference can’t be explained by chance, Berkeley students
weigh less b) z=0.97, p=0.165 c) can’t reject null, this could just be chance. The data are consistent
with the two population averages being the same. d) False, the data don’t follow the normal curve, but the test is UK since the sample average follows the normal curve. (check + 2553 is UK). ‘ 10. null: the difference in the votes for the 4 characters can be
explained by chance, each one should actually be expected to get an equal number of votes. .
alternative: the difference can’t be explained by chance, the voters aren’t just picking at random. The chisquare value is 14.4 and
the p—value (3 degrees of freedom) is less than 1%. We can reject the
null at the 1% level, it doesn’t Seem to be due to chance, and the voters don’t seem to be picking at random. Stat 21 Old Final Answers .
1. a) The total number of right answers is like the sum of 122x100 draws from a box with 80% 0’3 and 20% 1’3. b) 2 s: 50, c) yes.
2. a) b) cor(x,y) =0.43 2 x x
1 x
0 x
O 1 2
c) y=1/2 x + 3/4
d) plot the line above...
e) Below... f) 0.75, g) no y Regression estimate of y from x
0.25 0.75 1.25
0.25 3. a) x/ 272 m 16.5, b) yes, the regression method
4. Your answer depends a bit on what 2 value you choose for the 25th (or 75th) percentile. The answer should be somewhere between 12.5% and 13.6%.
5. EV=9, SE :24 6. a) 2% 0.95, using null that the diﬁerence should be equal to 2. I won’t
ask a question with an expected difference other than 0, though you should
be able to do this. Pz17.1%, _ b) not much evidence against their belief, accept null. 7. a) 6, b) J4— % 6.3 (do as Summation problem) 8. a) 20/25 . b) 16/25 9. a) null: The sample average is like the average of 100 draws from
a box with an average of 612. b) 2:: 2.25, P z 1.2%, c) yes, there is a decline. 10. 22 0.85, P m 19.8%, little evidence of eﬁectiveness. 11. a) The sample percentages were like the percentages of 1’s in samples
of 400 draws and 225 draws from 2 boxes containing the same percentage of
1's. b) zed 0.95, Pm 17.1%. c) chance variation 12. a) —$0.'53, 84.40; b) $1.32, $5.86 13. Not reasonable  the 90th percentile for women is higher than the 10th
percentile for men. 14. a) 97.1%; b) 0.9%, c) g)” 15. a) 172.25; b) y = 140093 + 2200‘, C)\/1 — 0.352 X $16,000 = 14,988. 16. a)'200; b) 16.5 17. (77%, 83%) 18. a) 24.5%; b) 2.15%; c) 208,000; d) 7400 19. (24.3%, 30.1%)  20. a) equal to 161 pounds; b) smaller than 4.00 pounds. 21. a) ﬁnal = 1.5 x midterm + 15.5; b) 68; c) 20 22. m = 6.78 _ 23. a) yes, normal approximation; b) no, the sample average is known c) no, it will be about 1/ x/2 times as wide 24. $15,500,000; $544,000 25. False, we don’t have a simple random sample. 26. a) $0, ‘35; b) 12.5, $52.7 27. $650,000 28. a) 40; b) 1 29. No, house prices don’t follow the normal curve. 30. 100
— —— 2: 9.
1 (1000) 5% 31. a) The percentage of households in the sample with incomes above
$39,000 is like the percentage of 1’s in 625 draws from a box with 50% 1’5.
b) 2:3, Pz 0.14%; 0) yes. .32. z 3.5% 33. a) 10/25; b) 1(2/5)5 34. False, the chance that your net gain is will turn out to be $4 or more is
1 — (9/10)5 ‘25 41%. 35. False. The 90th percentile can be any amount higher than twice the
median. Example: list of 10 numbers: 1,2,3,4,5,6,7,98,99,100 has a median
of 5.5 and the 90th percentile is between 99 and 100. I 36. a) 75/90; b) 75/90 X 74/89 x x 61/76 37. False, the SD of the ﬁrst group is actually about 3.8 points higher than the SD of the second group.
38. (b) is somewhere between 2 and 4 times as likely as (a). The chance of (a) is 1 — (999/.i000)1OD a: 0.095. The chance of (b) is 1 — (999/1000)400 m 0.33.
39. 30.50, 2 $6 ‘il Statistics 21 '
Problems from "past ﬁnal exams . 1. (10 points) A psychologist administrated a multiple—choice test to 122 college students.
Each question had ﬁVe possible answers andonly onewas correct. The questions were
taken from the reading comprehension section of the SAT verbal test, with one important
change. The SAT presents short passages to read and follows these with multiple~choice
questions about the meaning of the passages. For his test, the psychologist took only the
questions—not the reading passages. So someone taking the psychologist’s test has to
anSWer the SAT questions without having read the passages. Why make up such a crazy
test? The psychologist was trying to ﬁnd out whether this part of the SAT was really ,‘testing reading ability or some other test—taking skill. 'None of the 122 students had taken
the SAT. Their average score on the psychologist’s test turned out tohe 33 points out of
.100. (There were 100 questions; a correct answer was worth one point and an incorrect
answer zero points.) Theipsychol‘ogist took the average of 38 as evidence that points can
be earned on this part of the SAT by using mental skills other than the ability to
understand the passages. Of course, the psychologist realized that the students would get
some answers right just by luck. ‘ ' (a) Formulate the null hypothesis implicit in the above paragraph as a statement about a box 7 model. ‘ . ‘
’(b) Calculate the appropriate test statistic.
(6) Would you reject the null hypothesis? 2. (10 points) For the data set shown below: L—l
0 1
‘1 '2
1 O
2 2 (a) Plot the scatter diagram. I (13) Find the correlation between x and y.
(0) Write down the equation of the regression line for predicting y from x. I‘ Plot the regression line on the scatter diagram. _
(a) Complete the following table (there are 8 missing numbers): Regression estimate of y from x '
0 (f) Find the r.m.s. error of the regression line for estimating y from x. .
(g) Is there some other line which will lead to a smaller r.m.s. error than the one in (0)? Answer yes or no and explain brieﬂy. '1 (5 points) A person will be picked at random. from the class in problem 7—i and you will be
tolds'his or her midterm score. Using this information, you have to goess his or her ﬁnal score. Someone says: “Look, the ﬁnal "average was exactly tittice the midterm average. ‘50, just double the
midterm score you are told and use that as your guess.”’ _ r _ d .the ‘quote. Give your answer to at least ,__.____—_ (a) Find the r.m.s. error for the methOd suggests one decimal place. 1 o
. (b) Is there another method with a smaller r.m.s. error? Answer yes or no, and exp am brieﬂy. tion time is given twice to a group of 1,200 men: the ﬁrst time at age (10 points) A test of reac
tatistics for the results are as follows: twenty, the second time at age forty. The summary s Age 20: Average = 780 SD = 105
Age 40: Average = 810 SD = 100
Correlation = 0.80 The units for the reaction times are milliseconds. Of those whose reaction time was average at age twenty, what percent had times at forty which were in the bottom quarter for the 1,200 men at that age? (You may assume the scatter diagram is football shaped.) (10 points) A computer program simulates drawing ﬁfty times at randOm, with replacement,  from the box: The program prints out the results from left to right, in two rows of twentyfive digits each: latrow: x x ... x
2nd'row:  x x
X x 'x Find the expected value and the SE for the number of times a digit in the second row is the
same as the one directly above it. (10 points) A transportation study involves two suburbs of a large city. One suburb is a little.
farther from the city center than the other. The investigators believe that living in the more
distant suburb. adds, on average, not more than 2 minutes‘to the commute time. The ' investigators take a simple random sample of 900 from the residents of the more distant suburb, and a simple random sample of 400. from the residents of the more central suburb. In the sample of 900 residents, the‘average commute time on a particular day was 17 .4 minutes;
the SD was 18.0 minutes. In the sample of 400 residents, the average commute time on that Same day was 14.5; the SD was 15.0 minutes.
Does these results provide evidence against the investigators' belief? (a) Calculate the appropriate test statistics and find the observed signiﬁcance level.
03) Answer the question. A "l: (10 points) In a study of 600 identical twins, the twins were given, among other things, a vocabulary test. The smmnary statistics for the scores are as follows: _ younger twin: average = 25 pts 'SD of score = 10 pts
older twin: average = 25 pts SD of score : 10 pts
Correlation = 0.80 (a) Someone proposes to use the regression method for estimating the score of the older twin
from the score of the younger twin. Find the r.m.s. error for this method. (b) Someone else, who has not heard of the regression method, proposes to take the score of
the younger twin, and use it, unchanged as the estimate for the score of the'older twin. Find the r.m.s. error for this method. 8 . ( 10 points) Two draws are made atrandorn with replaCement frOm the box: ' i. [0. ll. (3) What is the chance different letters are drawn? 
(b) What is thechance ofg‘etting a vowel at least once in the two draws? _m‘ (10 points) In the preceding five years, entering students at a certain university had an ‘ average SAT verbal score of 612 points. A simple random sample of one hundred students is
taken from this year’s entering class. The average SAT verbal score for these students is 594
points with an SD of 80 points. Does this show a decline in entering student’s verbal ability? (a) Formulate thenull hypotheses implicit in this question as a statement about a box model.
(b) Calculate the appropriate test statistics and ﬁnd the observed signiﬁcance level. ‘ (0) Answer the question. (10 points) Researchers at the University of Rochester School of Medicine and Dentistry
wanted tosee if the drug clonidine was effective in helping people quit smoking. They took a
group of 185 smokers and chose 92 at random to receive the drug. The 93 remaining were
given a placebo. After four weeks of this treatment the researchers found that 17 of the 92, and 13 of the 93 were no longer smoking. What do you conclude? (Calculate the appropriate test statistics and ﬁnd the observed signiﬁcance level.) (10 points) On a vocabulary test given to a sample of undergraduates at a large university.
15% of the men and 18% of the women knew the meaning of the word “arcane.” The group
tested consisted of a simple random sample of 400 from the male undergraduates, and
independently, a simple random sample of 225 from the female undergraduates. Does the difference of 3 percentage points mean that a larger percentage of women
undergraduates know the word than the men, or is it a chance variation? (3) Formulate the null hypotheses implicit in this question as a statement about a box model.
(b) Calculate the appropriate test statistics and ﬁnd the observed signiﬁcance level.
(c) Answerthe question. lita (10 points) A couple enters a gambling house. The man goes to a roulette wheel and bets $1
on red 15 times in a row. The woman goes to a different roulette wheel and bets $1 on a column 10 times in a row. A column bet pays 2 to 1 and there are 12 chances in 38 to win. A
bet on red pays even money and there are 18 chances in 38 to win. (a) The woman’s net gain will be around $ give or take 35 or so.
03) The couple’s net gain will be around $ give or take .8 7 or so. 13. (10 points) The summary statistics for height in the U.S. are as follows: Men: Average Height: 69.0 inches SD = 3.0 inches
Women: Average Height 2 63.5 inches SD = 2.5 inches Semeone claims to be taller than 90% of women in the U.S. but shorter than 90% of the men.
Is this claim reasonable?  Choose one of the 'options below. Mark your choice. with a check mark (Ni). Yes, the claim is reasonable . _
No, the claim is not reasonable . Iﬁyhur answer is yes, ﬁnd the height of the person making the claim. If your answer is no,
explain why the claim is not reasonable.  ;. i ‘f, (ISpoints) Ten draws are made'at random with replacement from the box:  (a) Find, approximately," the chanCethe sum of the ten numbers drawn is less than 18.
(b) Find, approximately, the chance the sum of the squares of the ten numbers is less than 18. ~
(0) In part (b), find the exact chance the sum of the squares of the ten numbers is less than 18. [5 (10 points) The summary statistics below are taken from a representatiVe sample of 637
California men, age 2529 in 1988. average education = 12.5 years ' SD = 4 years average income 25819300 SD '= $16,000
correlation = 0.35 ' In parts (a) to (d) below,_let n = 637, x be years of education, and y be income. lb. It fit. (a) The average value of 11:2 is: (b) Find the equation of the regression line. (C) '
Z [to — (1400 x 13+ 2200)]2 f
. 637 d— (10 points) One hundred draws will be made at randomf'rom the box: The chance the sum of the draws will fall in the range from: —__....L._t0__,___~+ (a) (b) (61) Ch) is approximately 90%. Find two numbers to fill in the blanks so as to make the above statement true. Put one of the
numbers in the two (a) slots and the other in thetwo (b) slots. (10 points) A city has 50,000 small businesses. The planning department takes a simple
random sample of 625 such business, and sends out a team of interviewers to these
businesses. The interviewers proceed to administer a questionnaire to all the employees of
each business in the sample. It turns out that the average number of employees in the sampled
businesses is 8.3 and the SD is 2.7. In 502 of the businesses in the sample, all the employees had 12 years or more of education. If possible, find a 95% conﬁdence interval for the percentage of small businesses in the city where all the employees have 12 years or more of education. If it is not possible, explain why
not. ' (10 points) A survey organization takes a simple random sample of 400 households from a
city of 80,000 households. On the average, there are 2.60 persons per sample household,
and the SD is 1.85. Out of the 400 households in the sample, 98 were one person
householdsmthat is, consisted of a single person living alone. (a) Estimate the percontage of households in the city that are one person households  i (b) Attach a standard error to the estimate in (a).
(0) Estimate the total population of the city. 74'. (d) Attach a'standard error to the estimate in (c). Explain carefully all the steps in your
calculation. ‘ (10 points) A health maintenance organization is interested in the number of households in a
city which are not covered by “health insurance. The organization takes a simple random
sample of 900 households from the city. The sample is divided into two pans according to
the age of the head of the household. The first part consisted of all households in the sample where the head of the household is age 30 or under. _
second part consists of the households where the head is over 30. There were 380 of these households. The table gives some of the findings from the sample. Age of Head of Households % Uninsured 30 years or under 32.5% "__"_'_'" ' ' '_" "' 7 7 _'__"'
M The ﬁrst line in the table, for example, says that 32.5% of thehouseholds where the head is
under 30 are not covered by health insurance. _ If possible, ﬁnd a 95%confidence interval for the percent of households in the city not
covered by health insurance. If this is not possible, explain why not. (10 points) As part of his ﬁrst class in exercise physiology, 3 physical education instructor,
with the help of his teaching assistants, measures the height and weight of everyone in the
class. There are 200 students in the class. Their average weight turns out to be 161 pounds
'and'the SD is 28.3 pounds. (The class is all male.) His next step is to take a simple random
sample of 50 students from the class and interview the sample students about their exercise habits. (a) The expected value for the average weight of the 50 men in the sample is . Fill in
the blank with one of the three options below. smaller than 161 pounds equal to 161 pounds bigger than 161 pounds '(b) SE for the average weight of the 50 men in sample is ' . Fill in the blank with one of the three options below. smaller than 4.00 pOunds, equals to 4.00 pounds bigger than 4.00 pounds Show your work and/or give reasoning. (10 points) There are 300 students in an economics course. At the end of the course, the
followmg summary statistics were calculated: Average = 31 points Average = 62 points
correlation = 0.60 The midterm had a total of 50 points and the ﬁnal had a total of 100 points. Sl) = 8 points midterm:
SD = 20 points ﬁnal: (3) Find'the regression equation for predicting the ﬁnal score from the midterm score. (b) Use the equation in (a) to predict the ﬁnal score of someone who had a midterm score of
35 points. 
(0) Here is another equation for predicting the ﬁnal score form midterm score:
predicted ﬁnal score = 62 points Find the r.m.s. error if this method is used to predict the ﬁnal score. 12. (10 points) In the data set below: X y
XI y:
X2 Y2
X1: ya The averages, SD’s and correlation coefﬁcients are given by: average ofx = 0 SD of x = 3.0
average of y: 0 SD of y = 5.0
correlation 2 0.40 Findthe SD of the following list:
XlIyl, x:+y2,..., Xnlyn (For example, if the x—list were I, 2, 3 and the ylist were 10, 15, 20 the list xi+y:, X2+y2,
Xa+ya would be 11, 17, 23.) .. '23, (10 points) As part of a survey on physical ﬁtness, a large university takes a simple random
sample of 400 male students. The average height of the men in the sample turns out to be 70
inches, and 95% of the men (in the sample) are between 64 and 76 inches tall. A histogram
for their heights is plotted and follows thenormal curve closely. Say whether each of the followingstatements is true or false, and explain why. (a) 26% is a reasonable estimate for the percent of men in the university who are over 6' tall. .
(b) The range from 6 ” to 76” is an approximate 95% conﬁdence interval for the average height of'the men in the sample. _ (c) An approximate 95% conﬁdence interval based on a sample of 800 men will be about half
as wide as one based on a sample of 400. (Assume these are simple random samples and
the conﬁdence interval is for the average height of men in the university.) 21‘; (10 points)" A computer file contains marketing data on 50,000 For 40,000 of the
families, the ﬁle has an entry for family income. For these families, the average income is r f as. 7/6». $31,000 and'the so A is $20,000. For the other'10,000 families, income information is
missing. . ' I  . An economist plans to take a simple random sample of 625 from the 50,000 families. Many
of the families in the sample will have an income entry in the ﬁle. The total of the incomes of . these families will be around _ give or take or so. (5 points) A simple random sample of 1000 households is taken from a large city. For each
sample household, one person is selected at random from those over 21 in the household.
Out of the 1000 persons chosen in this way, 650 are newspaper readers. True or False, and explain: _ ""The*r'an’ge’froni*6”2’%’ t06’8%’is an approximate 95%:conﬁ'd'ence intervalforthepercentageefew city residents over 21 who are newspaper readers. (10 points) Ray and Bob play the following game: A pair of dice is rolled. One of the dice is red and the other is blue. If the outcome on the
red die is bigger than the outcome on the blue die, Ray wins and Bob pays him $1. If the
outcome on the blue die is bigger, Bob wins and Ray pays him $1. If the outcomes are the same, no money changes hands. Suppose they play thirty times. or so. (a) Ray’s net gain will be around give or take give or take or  (b) The number of times Ray wins will be around '
so. 14. .(5 points) A lending institution makes a study of 400 of its loans. For all these loans, interest
‘ was paid annually. The‘interest rate varied, depending on the loan. The following descriptive
_statistics were obtained: SD = $5,000 average size of loan =.$23,000
SD = 0.5% Average interest rate = 7%
correlation = 0.6 Find the total amount of money the lending institution earns each year from the interest on
these 400 loans. 18’. (5 points) In the data set below, the two numbers in each row add to 100 That is, x+yi=100,...,xn+yn=100. '
The average of the x—cblumn is 60 and the SD 18 10. X )3 K1 3?:
X2 y:
Kn Yn r (a) .Findthe average of the y—column. (b) Find the correlation between x and y. ' Title. .(5 pcints),,,The average. sale price of a housein a certain region .of the conntry 1.1.11991 iS.
$185,000; these SD is $135,000. Is it reasonable to conclude that the 40th percentile of sale prices is about $151,000? 76. (10 points) The letter below appeared in the New York Times of July 20., 1987. What
number belongs in the‘blank? Make sure you write down all your reasoning and/or calculations. ' Assessing the Odds of AIDS Infection To the editor: .
We agree with Joseph L. Gastwirth (“Statistically,A1DS Poses 3 Significant Risk,” letter, July 1) that a risk of transmission of human immunedeﬁciency virus of l in 1,000 for a single act of
heterosexual intercourse is quite signiﬁcant. The infectivity rate of i. in 1,000 was derived from a study _ _
of the female partners of men infected with the acquired immune deﬁciency syndrome vims. In this
study, the number of contacts (unprotected intercourse) was highly associated with transmission.
However, we would also like to point out the differences between assessing risk for an individual compared with determining risk for a population. Infectivity rates are necessary tools that
epidemiologists use to assess and predict the spread of diSeaSe in pepulations. A relatively low rate may
, imply a. slow or negligible spread in large groups. ‘ I Howaver, a low infectivity rate may imply something different for an individual. The probability
of transmission is the same for the ﬁrst contact as for the thousandth, hence given that AIDS is a lethal
disease, one must err on the side of caution and act as if each contact with an infected partner could result
in transmission. Also, though the risk of transmission may be low in a single exposure, multiple
exposures increase risk according to well¥known probability laws. , For example, the probability that angrilnfected woman will acquire infection from her infected male
partner after 100 contacts would be . i ' Furthermore, although this rate may seem low, and may be lower than rates for other sexually transmitted diseases, it is unclear whether it is high enough to sustain an epidemic among heterosexuals. Other factors such as variations in susceptibility, infectiousnessand concomitant sources of exposure also affect disease spread, and these Were not assessed in study.
' Nancy S. Padian James A. Wiley
Berkeley. Calif., July 2, 1987 The writers are, respectively, a research epidemiologist at the School of Public Health, University of
California, and a sociologist at the SurVey Research Center. ' q 3/, (10 points) In 1990, the median household income in Jefferson County, colorado
(population 438,430) was $39,000. In June 1993, a market research organization took a
simple random sample of 625 households from the county. In the sample, 56% of the
households had an incomeiover $39,000. Is this evidence that median income 1n Jefferson County increased from 1990 to 1993? (a) Formulate the null hypothesis implicit in this question as a statement'about a box model.
(b) Calculate the appropriate test statistic and find the observed Signiﬁcance level. (0) Answer the question. 32. (10 points) In a certain city, the household income distribution is as follows: Income Ran e so — $30,000 ~ 40,000
$30,000 — $100,000 376,000
$100,000 — $1,000,000 ‘ 4,000 Amarket research firmtakes a simple random sample of 400 households from the 80,000
households in the city. ' Find the chance there are 175 households in the sample  no meteor no less  with incomes in
the $30,000 to $100,000range. ‘ 33‘ (10 points) Five draws are made at random with replacement from the box: =E (a) Find the chance that the ﬁrst number drawn is bigger than the'second.
(0) Find the chance that not all the numbers drawn are even. Z‘f.  (5 points) True or false, and explain: “In problem #21 the chance your net gain will out
to be $4 or more is between 20% and 30%.” ' ‘ 3;. (5 points) True or false, and explain: For any list of positive numbers, the 90th percentile is
.alvvays less than twice the median.  30, (10 points) A state planning organization periodically takes a survey of the companies located
within the state. In one part of the survey, a simple random sample of 15 manufacturers is
drawn from the 90 manufacturers in the state. ‘ (a) One of the manufacturers in the state is very large. Find the chance this manufacturer
does not get into the sample. (0 ..._...N11mb.cr_of_Householdsm,,, .._....__.__ WWW _ ,, 3?. '33: (h) The survey is done twice a year: in the spring and in the fall. “Find the Chance there is no overlap between thespring sample and the fall sample. In other words, ﬁnd the chance
no manufacturer appears in .both of the samples. You do not need to work out the arithmetic. (5 points) Two large groups of people take the same test. In each group, the scores followed
the normal curvo closely. Both groups had thesame average score, but the 90th percentile 1n
the ﬁrst group was 5 points larger than the 90th percentile in the second group. Trueor False, and explain:
The SD (of scores) in the ﬁrst group is 5 points larger than the SD in the second group. (10 points) Two envelopes contain lottery tickets, all from different lotteries. All the lotteries
offer the same chance of winning: 1 in 1,000. One envelope (call it A) contains 100 tickets,‘ the other (B) contains 400 tickets. Here are twopossibilities: .
(a) Envelope A contains a winning ticket.
(b) Envelope contains a” Winning ticket. Check one'option below: ‘ (b) is more than four time as likely as (a).
_____ (b) is four times as likely as (a). . a 
(b) is somewhere between twiCe and four times as likely as (a).
(b) is tWiCe as likely as (a). I '
Give the reasoning and lor calculation for your choice. I 37_ (10 points) A gambling house offers the following game: “First, you stake Then two tickets are drawn at random, without replacement, from the
box shown below. If'the ﬁrst ticket if a YOU and the second is a WIN, you get your stake back, and in addition, a prize of $3. If the tickets come up in any other way, you forfeit your
' stake. After the money has changed hands, the two tickets are replaced in the box.” " Suppose you decide to play this game 5 times. Your net gain will be around it give or take $ _ or so. It ...
View Full
Document
 Spring '08
 ISBER

Click to edit the document details