{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

week_ten_session_one_notes_chapter_22_

week_ten_session_one_notes_chapter_22_ - Week ten session...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 10
Background image of page 11
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Week ten session one lecture notes (Chapter 22) Professor Esfandiari The objective of this lecture is to... Remind you of the variance of the difference between two independent random variables. Discuss the sampling distribution of the difference between two proportions (pAl — pA2) resulting from independent groups. Discuss the assumptions underlying the sampling distribution of (pAl — p"2). Revisit CLT and the normal model underlying the sampling distribution of (p/‘l — pAZ). Discus the confidence interval for the difference in two population proportions resulting from two independent populations. Test the null hypothesis between the proportions in two independent populations. The standard deviation of the difference of two independent random variables In prior lectures it was pointed out that, if two random variables are independent, then Var (X — Y) = Var (X) + Var (Y) It follows that SD (X — Y) = 1’Var(X)+ Var(Y) Standard deviation of the difference between two proportions We are interested in proportions that are based on independent random samples are independent. For instance if we observe the proportion of male and female physicians who follow a career in pediatrics, these two proportions will be independent. Thus: SD (p/‘l — pAZ) = 1/Var(p/\1) + var(p’\2) In prior lectures we showed that SD (pA1)= qlpq/n It follows that Var (pAl) = pq/n (p is the proportion of success in the population) It follows that SD (pAl —p/‘2) = 1/(p1q1)/nl + (p2q2)/n2 Sampling distribution of pAl — p"2 Based on CLT, if the samples are independent (this would be evident from the way the samples are selected), and samples are large enough (the samples should not exceed 10% of the population), then the sampling distribution of pAl — p02 follows the normal model with the mean or {u = P1 — P2} and standard deviation equal to SD (pAl — pAZ) = 1k p1q1)/nl + ( p2q2)/n2. Testing the null hypothesis about the difference in the proportion in two independent populations. At a major health corporation they wonder (are not sure, two-tailed test) whether the proportion of males and females who seek surgery for intestinal issues is similar for the 7080 year age range. They randomly pick the files of 100 males and 100 females who have refered to their clinics for the same intestinal issues. 70 out of 100 females and 50 out of 100 males choose to undergo surgery for the same intestinal issue. Assuming that these two samples are representative of the population of 70-80 year old men and women who have the same intestinal issue, test the null at alpha = 0.05. Number of women seeking surgery =l40, P" (w) = 140/200 = 0.70 Number of men seeking surgery = 60, p’\(M) 60/200 = 0.30 Null hypothesis: H0: P women — P men = 0 There is no relationship between gender and seeking surgery for the same intestinal issue. There is no difference between the proportion of men and women who seek surgery for the same intestinal issue. Alternative hypothesis Ha: P women — P men # 0 There is a relationship between gender and seeking surgery for the same intestinal issue. There is a difference in the proportion of men and women who seek surgery for the same intestinal issues. Based on the normal model, the Z(X) was calculated as follows: Z(X) = [X — Mean(X)]/SD(X) Based on CLT the distribution of P01 — pA2 follows the normal model with mean (P1 — P2) and SD (pAl —p"2) = (plql)/n1+(p2q2)/n2. Thus, Z(PAl—pAZ) = A1— A2 — Pl—PZ W (P1 —P2) is the value under the null and we have set it equal to zero. 2 (p/‘l — pA2) = {(0.70 — 0.30) — (0) Wm 1/(0.70* 0.30)/100 + (050* 00.50/100) /.f"” ~ \\,\ Z (pAl _ pAZ) : (0.40)/0.0678— _ 5 90 //,. ‘3‘? . 7 if; , <\ {H K. s . ,. WWW».- :_~.. $1.1“. F ”i7 M: We 1 P-value— — {prob (z>= +5 90) + prob (Z‘<— —- —5. 90)} TESTING THE NULL HYPOTHESIS THROUGH CALCUALTION OF THE P-VALUE Based on the calculations reported above, the P-value is almost equal to zero. Thus, the actual risk that we would take to reject the true null is almost equal to zero. In other words, our sample data provide enough evidence against the null hypothesis to allow us to reject it. Thus, we reject the null and conclude that for the same intestinal issue, the proportion of women who choose to have surgery is higher than men. P value or the actual type I error (the actual risk of rejecting a true null) is less than or = 0.05 (or the nominal risk of rejecting the true null) and thus we reject the null hypothesis. TESTING THE NULL HYPOTHESIS WITH EXAMINATION OF CRITICAL VALUES We could also test the null through examining the critical values. With a = 0.05. The critical values for a two-tailed test are +/-1.96. If the calculated Z value is within the confidence limits (that is +/—l .96), we fail to reject the null and if it is larger than the critical value of =1 .96 or smaller than the critical value of -1.96, we reject the null. Since the calculated value of +5.90 is larger than the critical value of Z = +1.96 and falls in the critical region, we reject the null. IMPROTAN CE OF INTERPRETATION THE RESULTS OF HYPOTHESIS TESTING WITHIN CONTEXT Simply saying that we reject the null is not good enough, you need to go back to the sample data, examine pAl and p"2, see which on is higher, and interpret the results within context. Thus, you should not just say the proportion of men and women who seek surgery for the same intestinal issue is different. You should say which population has a higher proportion of surgery for the same intestinal issue. a a 25 n ,. \ ~ 9 ‘ My is”; 3 \l/ . Eff??? vwwwh w s... ;.. 17;)"; . "h "7} 5’ CALCULATION OF CONFIDENCE INTERVAL FOR P1 — P2 Remember CI for any parameter= statistic + Margin of error 95% CI = statistics +/-1.96 * SE (statistic) Statistic is what you calculate based on sample in this case (pAl — p"2) SE is the standard error of the parameter we want to estimate in this case would be SE (pAl — pAZ) Thus it follows that: 95% CI for (P1 — P2) = (pAl — pAZ) +/-1.96 * 1/(PA1'qA1/n1)+( p/‘Zq‘2/n2) Example: Given the following data, calculate and interpret the 80% confidence interval for P1 — P2. For a random sample of 100 female lawyers, 20 practice criminal law. For a random sample of 100 male lawyers, 40 practice criminal law. 80% CI for (P1 — P2) = (pAl — p"2) +/- 1.28* 1/(PA1q¢A1/ni)+(pA2qA2/nz) (0.40-0.20) +/- 1.28*1/(0.40* 0.60/100) + (020* 080/100) 0.20+/—1.28*0.063 = 0.20 +/- 0.08 = (0.12,0.28) Z = -1.28 corresponds to the 10fh percentile. Z = +1.28 corresponds to the 90th percentile (see the normal table at the end of this handout). For 80 percent confidence, alpha is 0.20 and we need to divide it by two and place it in the right and left tail of the distribution. We are 80% confident that the proportion of males lawyers who practice criminal law is 12% to 28% higher than female lawyers who practice criminal law. Testing the null through the confidence interval If we were testing the null that the proportion of male and female attorneys who practice criminal law is the same, we would be rejecting the null because the confidence interval we found (12% to 28%) does not include the hypothesized value under the null of P1 — P2 = 0. TESTING H0: P1 — P2 = 0 BASED ON POOLED P" AND POOLED SE When we have data from two populations but WE BELIEVE THAT THEY REALLY COME FROM THE SAME UNDERLYING POPULAITON, we pool the proportions from the two samples from the underlying populations to get a pooled estimate. P" pooled = (fl + 12)/(n1 + n2) fl = n1*p"1 what we consider the number of successes in sample one 12 = n2*p"2 what we consider the number of successes in sample two n1 = size of sample one n2 = size of sample two SE pooled (p"1 — pA2) = 1’ p" pooled * q" pooled / 711 + p" pooled * q" pooled / 112 From the above it follows that: Z = A1 — A2 — 0 SE pooled (pAl —P’\2) Example using pooled p" Problem 20 from the book Depression: A study published in the Archives of General Psychiatry in March 2001 examined the impact of depression on the patient’s ability to survive cardiac disease. Researchers indentified 450 patients with cardiac disease, evaluated them for depression, and followed the group for four years. Of the 361 patients with no depression 67 died. Of the patients with minor or major depression 26 died. Among people who suffer from cardiac disease, are depressed patients more likely to die than non-depressed ones? a) What kind of design was used to collect these data? b) Write appropriate hypotheses? c) Are the assumptions and conditions necessary for inference satisfied? d) Test the hypothesis and state your conclusion. e) Explain in this context what your P-value means. f) If your conclusion is actually incorrect, What type of error did you commit? Solution to problem 20. Depression. 21) This is a prospective observational study. b) Ho : The proportion of cardiac patients without depression who died within the 4 years is the same as the proportion of cardiac patients with depression who died during the same time period. P (None) = p (Depression) p (None) - p(Depression) = 0 HA: The proportion of cardiac patients without depression who died within the 4 years is the less than the proportion of cardiac patients with depression who died during the same time period. P (None) < 1) (Depression) p (None) — p(Depression) < 0 1/[(93/450) * (357/450)]/ 361 + [(93/450)* (357/450)]/ 89 c) Randomization condition: Assume that the cardiac patients followed by the study are representative of all cardiac patients. 10% condition: 361 and 89 are both less than 10% of all teens. Independent samples condition: The groups are not associated. Success/Failure condition: np“ (no depression) = 67, nqA (no depression) = 294, mp“ (depression) = 26, and nq” (depression) = 63 are all greater than 10, so the samples are both large enough. Since the conditions have been satisfied, we will model the sampling distribution of the difference in proportion with a Normal model with mean 0 and standard deviation estimated by SE pooled (pA None — pA Dep) = 1’ p" pooled * q" pooled / n(n0ne) + p" pooled * q" pooled / n(dep) SE pooled (pA None — p" Dep) = 1/[(93/450)*(357/450)]/361+ [(93/450)*(357/450)]/89 = 0.0479 (1) The observed difference between the proportions is: 67/361 = 0.1865 (the proportion who died and had no depression) 26/89 = 0.2921 (the proportion who died and had depression) 0.1856 — 0.2921: — 0.1065. Since the P-value = 0.0131 is low, we reject the null hypothesis. There is strong evidence to suggest that the proportion of non—depressed cardiac patients who die within 4 years is less than the proportion of depressed cardiac patients who die within 4 years. P = 0.0131 —O.1065 mo z = --—-f-— 0-0479 — 0.1065 0 z z —2.22 Z = -222 e) If there is no difference in the proportions, we will see an observed difference this large or larger only about 1.3% of the time by natural sampling variation. 1) If cardiac patients without depression don’t actually have a lower proportion of deaths in 4 years than cardiac patients with depression, then we have committed Type I error. Problems solved from the book 6. Buy it again? (problem) A consumer magazine plans to poll car owners to see if they are happy enough with their vehicles that they would purchase the same model again. They’ll randomly select 450 owners of American—made cars and 450 owners of Japanese models. Obviously, the actual opinions of the entire population couldn't be known, but suppose that 76% of owners of American cars and 78% of owners of Japanese cars would purchase another. a) What kind of sampling design is the magazine planning to use ? b) What would you expect their poll to show ? c) Of course, sampling error means the poll won’t reflect the difference perfectly. What’s the standard error for the difference in the proportions ? d) Sketch a sampling model for the difference in proportions that might appear in a poll like this. e) Could the magazine be misled by the poll, concluding that owners of American cars are much happier with their vehicles than owners of Japanese cars ? Explain. 6. Buy it again? (solution) a) This is a stratified random sample, stratified by country of origin of the car. b) We would expect the difference in proportions in the sample to be the same as the difference in proportions in the population, with the percentage of respondents who would purchase the same model again 2% higher among owners of Japanese cars than among owners of American cars. c) The standard deviation of the difference in proportions is: 0 (pAJ — pAA) = 1/PAJqAJ/nJ + pAAqAA/nA (078* O.22)/450+ (076* 0.24)/450 = 2.8% Difference in proportions of owners who would purchase _ the same model car again (Japanese ~ Amencan) —6.4% —3.6% —0.8% 2% 4.8% 7.4% 10.2% 99.7% e) The magazine could certainly be misled by the poll. According to the model, a poll showing greater satisfaction among owners of American cars could occur relatively frequently. That result is less than one standard deviation below the expected difference in proportions. 8. Graduation (problem). In October 2000 the US Department of Commerce reported the results of a large-scale survey on high school graduation. Researchers contacted more than 25,000 Americans aged 24 years to see if they had finished high school; 84.9 % of the 12,460 males and 88.1 % of the 12,678 females indicated that they had high school diplomas. a) Are the assumptions and conditions necessary for the inference satisfied ? Explain. b) Create a 95% confidence interval for the difference in graduation rates between males and females. c) Interpret your confidence interval. (1) Does this provide strong evidence that girls are more likely than boys to complete high school ? Explain. 8. Graduation (solution). a) Randomization condition: Assume that the samples are representative of all recent graduates. 10% condition: Although large, the samples are less than 10% of all graduates. Independent samples condition: The sample of men and the sample of women were drawn independently of each other. Success/Failure condition: The samples are very large, certainly large enough for the methods of inference to be used. Since the conditions have been satisfied, we will find a two-proportion z—interval. 1)) (pAF — pAM) +/— Z * 1/p"Fq"F/nF+ pAMqAM/nM (0881-0349) +/- 1.96* 1K0881)* 0.119/18,678 + (0.849(*(0.151)/12,460 =(0.024, 0.040) c) We are 95% confident that the proportion of 24-year—old American women who have graduated from high school is between 2.4% and 4.0% higher than the proportion of American men the same age who have graduated from high school. (1) Since the interval for the difference in proportions of high school graduates does not contain 0, there is strong evidence that women are more likely than men to complete high school. 14. War (problem). In September 2002, a Gallop Poll found major differences of opinion based on political affiliation over weather Congress should give President Bush authority to take military action in Iraq. Overall, about 50% of the 1010 respondents were in favor, with a margin of error of +/- 3%. Opinion differed greatly among Republicans, Democrats, and Independents. (See the graph in the next column.) a) How large did this poll estimate the difference in support between Republicans and Democrats to be ? b) Was the margin of error for that difference equal to, greater than, or less than 3% ? Explain. 14 war (solution) a) The poll estimated that the difference in support between Republicans and Democrats was 75% — 30%, or 45% higher for the Republicans. b) The margin of error for the difference would be greater than 3%, the margin of error for the single sample. Differences have standard errors that are larger than standard errors for single samples. 10 Z table ide the given table represent the areas under the standard normal curve for values lative z—score. For example, to determine the area under the curve between 0 and rsecting cell for the row labeled 2.30 and the column labeled 0.06. The area under the I 0.1064 ‘ . 43 11 ' _ 0.3212 “0.323%; . 0-3438 034W“? 03485 1519W 1 ‘ 0.4992 104929 0.4993 ...
View Full Document

{[ snackBarMessage ]}