This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Week ten session one lecture notes (Chapter 22)
Professor Esfandiari The objective of this lecture is to... Remind you of the variance of the difference between two independent random
variables. Discuss the sampling distribution of the difference between two proportions (pAl — pA2) resulting from independent groups. Discuss the assumptions underlying the sampling distribution of (pAl — p"2).
Revisit CLT and the normal model underlying the sampling distribution of (p/‘l — pAZ). Discus the conﬁdence interval for the difference in two population proportions
resulting from two independent populations. Test the null hypothesis between the proportions in two independent populations. The standard deviation of the difference of two independent random variables In prior lectures it was pointed out that, if two random variables are independent, then Var (X — Y) = Var (X) + Var (Y) It follows that SD (X — Y) = 1’Var(X)+ Var(Y) Standard deviation of the difference between two proportions We are interested in proportions that are based on independent random samples are
independent. For instance if we observe the proportion of male and female physicians
who follow a career in pediatrics, these two proportions will be independent. Thus: SD (p/‘l — pAZ) = 1/Var(p/\1) + var(p’\2) In prior lectures we showed that SD (pA1)= qlpq/n
It follows that
Var (pAl) = pq/n (p is the proportion of success in the population) It follows that SD (pAl —p/‘2) = 1/(p1q1)/nl + (p2q2)/n2 Sampling distribution of pAl — p"2 Based on CLT, if the samples are independent (this would be evident from the way the
samples are selected), and samples are large enough (the samples should not exceed 10%
of the population), then the sampling distribution of pAl — p02 follows the normal model
with the mean or {u = P1 — P2} and standard deviation equal to SD (pAl — pAZ) = 1k p1q1)/nl + ( p2q2)/n2. Testing the null hypothesis about the difference in the proportion in two
independent populations. At a major health corporation they wonder (are not sure, twotailed test) whether the
proportion of males and females who seek surgery for intestinal issues is similar for the
7080 year age range. They randomly pick the ﬁles of 100 males and 100 females who
have refered to their clinics for the same intestinal issues. 70 out of 100 females and 50
out of 100 males choose to undergo surgery for the same intestinal issue. Assuming that
these two samples are representative of the population of 7080 year old men and women
who have the same intestinal issue, test the null at alpha = 0.05. Number of women seeking surgery =l40, P" (w) = 140/200 = 0.70
Number of men seeking surgery = 60, p’\(M) 60/200 = 0.30 Null hypothesis: H0: P women — P men = 0 There is no relationship between gender and seeking surgery for the same intestinal issue.
There is no difference between the proportion of men and women who seek surgery for
the same intestinal issue. Alternative hypothesis Ha: P women — P men # 0 There is a relationship between gender and seeking surgery for the same intestinal issue.
There is a difference in the proportion of men and women who seek surgery for the same intestinal issues.
Based on the normal model, the Z(X) was calculated as follows:
Z(X) = [X — Mean(X)]/SD(X) Based on CLT the distribution of P01 — pA2 follows the normal model with mean
(P1 — P2) and SD (pAl —p"2) = (plql)/n1+(p2q2)/n2. Thus, Z(PAl—pAZ) = A1— A2 — Pl—PZ
W (P1 —P2) is the value under the null and we have set it equal to zero. 2 (p/‘l — pA2) = {(0.70 — 0.30) — (0) Wm 1/(0.70* 0.30)/100 + (050* 00.50/100) /.f"” ~ \\,\
Z (pAl _ pAZ) : (0.40)/0.0678— _ 5 90 //,. ‘3‘? . 7 if; ,
<\ {H K. s . ,. WWW». :_~.. $1.1“.
F ”i7 M: We 1 Pvalue— — {prob (z>= +5 90) + prob (Z‘<— — —5. 90)} TESTING THE NULL HYPOTHESIS THROUGH CALCUALTION
OF THE PVALUE Based on the calculations reported above, the Pvalue is almost equal to zero. Thus, the
actual risk that we would take to reject the true null is almost equal to zero. In other
words, our sample data provide enough evidence against the null hypothesis to allow
us to reject it. Thus, we reject the null and conclude that for the same
intestinal issue, the proportion of women who choose to have surgery
is higher than men. P value or the actual type I error (the actual risk of rejecting a true null) is less than or =
0.05 (or the nominal risk of rejecting the true null) and thus we reject the null hypothesis. TESTING THE NULL HYPOTHESIS WITH EXAMINATION OF CRITICAL
VALUES We could also test the null through examining the critical values. With a = 0.05. The critical values for a twotailed test are +/1.96. If the calculated Z value is within the
conﬁdence limits (that is +/—l .96), we fail to reject the null and if it is larger than the
critical value of =1 .96 or smaller than the critical value of 1.96, we reject the null. Since
the calculated value of +5.90 is larger than the critical value of Z = +1.96 and falls in
the critical region, we reject the null. IMPROTAN CE OF INTERPRETATION THE RESULTS OF HYPOTHESIS
TESTING WITHIN CONTEXT Simply saying that we reject the null is not good enough, you need to go back to the
sample data, examine pAl and p"2, see which on is higher, and interpret the results
within context. Thus, you should not just say the proportion of men and women who
seek surgery for the same intestinal issue is different. You should say which population
has a higher proportion of surgery for the same intestinal issue. a a 25 n ,. \ ~
9 ‘ My is”; 3 \l/ .
Eff??? vwwwh w s... ;.. 17;)"; . "h "7} 5’ CALCULATION OF CONFIDENCE INTERVAL FOR P1 — P2 Remember CI for any parameter= statistic + Margin of error
95% CI = statistics +/1.96 * SE (statistic) Statistic is what you calculate based on sample in this case (pAl — p"2)
SE is the standard error of the parameter we want to estimate in this case would be SE (pAl — pAZ) Thus it follows that: 95% CI for (P1 — P2) =
(pAl — pAZ) +/1.96 * 1/(PA1'qA1/n1)+( p/‘Zq‘2/n2) Example:
Given the following data, calculate and interpret the 80% conﬁdence interval for P1 — P2. For a random sample of 100 female lawyers, 20 practice criminal law.
For a random sample of 100 male lawyers, 40 practice criminal law. 80% CI for (P1 — P2) =
(pAl — p"2) +/ 1.28* 1/(PA1q¢A1/ni)+(pA2qA2/nz) (0.400.20) +/ 1.28*1/(0.40* 0.60/100) + (020* 080/100) 0.20+/—1.28*0.063 = 0.20 +/ 0.08 = (0.12,0.28) Z = 1.28 corresponds to the 10fh percentile. Z = +1.28 corresponds to the 90th percentile
(see the normal table at the end of this handout). For 80 percent conﬁdence, alpha is 0.20
and we need to divide it by two and place it in the right and left tail of the distribution. We are 80% conﬁdent that the proportion of males lawyers who practice criminal
law is 12% to 28% higher than female lawyers who practice criminal law. Testing the null through the conﬁdence interval If we were testing the null that the proportion of male and female attorneys who
practice criminal law is the same, we would be rejecting the null because the
conﬁdence interval we found (12% to 28%) does not include the hypothesized value
under the null of P1 — P2 = 0. TESTING H0: P1 — P2 = 0 BASED ON POOLED P" AND POOLED
SE When we have data from two populations but WE BELIEVE THAT THEY
REALLY COME FROM THE SAME UNDERLYING POPULAITON, we pool the
proportions from the two samples from the underlying populations to get a pooled
estimate. P" pooled = (fl + 12)/(n1 + n2) fl = n1*p"1 what we consider the number of successes in sample one
12 = n2*p"2 what we consider the number of successes in sample two
n1 = size of sample one
n2 = size of sample two SE pooled (p"1 — pA2) = 1’ p" pooled * q" pooled / 711 + p" pooled * q" pooled / 112 From the above it follows that: Z = A1 — A2 — 0
SE pooled (pAl —P’\2) Example using pooled p"
Problem 20 from the book
Depression: A study published in the Archives of General Psychiatry in March 2001
examined the impact of depression on the patient’s ability to survive cardiac disease.
Researchers indentiﬁed 450 patients with cardiac disease, evaluated them for depression,
and followed the group for four years. Of the 361 patients with no depression 67 died. Of
the patients with minor or major depression 26 died. Among people who suffer from
cardiac disease, are depressed patients more likely to die than nondepressed ones? a) What kind of design was used to collect these data? b) Write appropriate hypotheses? c) Are the assumptions and conditions necessary for inference satisﬁed? d) Test the hypothesis and state your conclusion. e) Explain in this context what your Pvalue means.
f) If your conclusion is actually incorrect, What type of error did you commit? Solution to problem 20. Depression. 21) This is a prospective observational study. b) Ho : The proportion of cardiac patients without depression who died within the 4 years
is the same as the proportion of cardiac patients with depression who died during the
same time period. P (None) = p (Depression) p (None)  p(Depression) = 0 HA: The proportion of cardiac patients without depression who died within the 4 years is
the less than the proportion of cardiac patients with depression who died during the same time period. P (None) < 1) (Depression) p (None) — p(Depression) < 0 1/[(93/450) * (357/450)]/ 361 + [(93/450)* (357/450)]/ 89 c) Randomization condition: Assume that the cardiac patients followed by the study are
representative of all cardiac patients. 10% condition: 361 and 89 are both less than 10% of all teens. Independent samples condition: The groups are not associated. Success/Failure condition: np“ (no depression) = 67, nqA (no depression) = 294, mp“ (depression) = 26, and nq” (depression) = 63 are all greater than 10, so the samples
are both large enough. Since the conditions have been satisﬁed, we will model the
sampling distribution of the difference in proportion with a Normal model with mean 0
and standard deviation estimated by SE pooled (pA None — pA Dep) = 1’ p" pooled * q" pooled / n(n0ne) + p" pooled * q" pooled / n(dep) SE pooled (pA None — p" Dep) = 1/[(93/450)*(357/450)]/361+ [(93/450)*(357/450)]/89 = 0.0479 (1) The observed difference between the proportions is: 67/361 = 0.1865 (the proportion who died and had no depression)
26/89 = 0.2921 (the proportion who died and had depression)
0.1856 — 0.2921: — 0.1065. Since the Pvalue = 0.0131 is low, we reject the null hypothesis. There is strong
evidence to suggest that the proportion of non—depressed cardiac patients who die within
4 years is less than the proportion of depressed cardiac patients who die within 4 years. P = 0.0131 —O.1065 mo
z = —f— 00479 — 0.1065 0
z z —2.22 Z = 222 e) If there is no difference in the proportions, we will see an observed difference this
large or larger only about 1.3% of the time by natural sampling variation. 1) If cardiac patients without depression don’t actually have a lower proportion of deaths
in 4 years than cardiac patients with depression, then we have committed Type I error. Problems solved from the book 6. Buy it again? (problem)
A consumer magazine plans to poll car owners to see if they are happy enough with their
vehicles that they would purchase the same model again. They’ll randomly select 450
owners of American—made cars and 450 owners of Japanese models. Obviously, the
actual opinions of the entire population couldn't be known, but suppose that 76% of
owners of American cars and 78% of owners of Japanese cars would purchase another.
a) What kind of sampling design is the magazine planning to use ?
b) What would you expect their poll to show ?
c) Of course, sampling error means the poll won’t reﬂect the difference perfectly.
What’s the standard error for the difference in the proportions ?
d) Sketch a sampling model for the difference in proportions that might appear in a
poll like this.
e) Could the magazine be misled by the poll, concluding that owners of American
cars are much happier with their vehicles than owners of Japanese cars ? Explain. 6. Buy it again? (solution)
a) This is a stratiﬁed random sample, stratified by country of origin of the car. b) We would expect the difference in proportions in the sample to be the same as the
difference in proportions in the population, with the percentage of respondents who
would purchase the same model again 2% higher among owners of Japanese cars than among owners of American cars. c) The standard deviation of the difference in proportions is: 0 (pAJ — pAA) = 1/PAJqAJ/nJ + pAAqAA/nA (078* O.22)/450+ (076* 0.24)/450 = 2.8% Difference in
proportions of owners who would purchase _
the same model car again
(Japanese ~ Amencan) —6.4% —3.6% —0.8% 2% 4.8% 7.4% 10.2% 99.7% e) The magazine could certainly be misled by the poll. According to the
model, a poll showing greater satisfaction among owners of American
cars could occur relatively frequently. That result is less than one standard
deviation below the expected difference in proportions. 8. Graduation (problem). In October 2000 the US Department of Commerce reported the results of a largescale
survey on high school graduation. Researchers contacted more than 25,000 Americans
aged 24 years to see if they had ﬁnished high school; 84.9 % of the 12,460 males and
88.1 % of the 12,678 females indicated that they had high school diplomas.
a) Are the assumptions and conditions necessary for the inference satisﬁed ?
Explain.
b) Create a 95% conﬁdence interval for the difference in graduation rates between
males and females.
c) Interpret your conﬁdence interval.
(1) Does this provide strong evidence that girls are more likely than boys to complete
high school ? Explain. 8. Graduation (solution). a) Randomization condition: Assume that the samples are representative of all recent
graduates. 10% condition: Although large, the samples are less than 10% of all graduates.
Independent samples condition: The sample of men and the sample of women were
drawn independently of each other. Success/Failure condition: The samples are very large, certainly large enough for the
methods of inference to be used. Since the conditions have been satisﬁed, we will ﬁnd a twoproportion z—interval.
1)) (pAF — pAM) +/— Z * 1/p"Fq"F/nF+ pAMqAM/nM
(08810349) +/ 1.96* 1K0881)* 0.119/18,678 + (0.849(*(0.151)/12,460 =(0.024, 0.040) c) We are 95% conﬁdent that the proportion of 24year—old American women who have
graduated from high school is between 2.4% and 4.0% higher than the proportion of
American men the same age who have graduated from high school. (1) Since the interval for the difference in proportions of high school graduates does not
contain 0, there is strong evidence that women are more likely than men to complete high school. 14. War (problem). In September 2002, a Gallop Poll found major differences of opinion based on political
afﬁliation over weather Congress should give President Bush authority to take military
action in Iraq. Overall, about 50% of the 1010 respondents were in favor, with a margin
of error of +/ 3%. Opinion differed greatly among Republicans, Democrats, and Independents. (See the graph in the next column.)
a) How large did this poll estimate the difference in support between Republicans and Democrats to be ?
b) Was the margin of error for that difference equal to, greater than, or less than 3% ? Explain.
14 war (solution) a) The poll estimated that the difference in support between Republicans and Democrats
was 75% — 30%, or 45% higher for the Republicans. b) The margin of error for the difference would be greater than 3%, the margin of error
for the single sample. Differences have standard errors that are larger than standard errors for single samples. 10 Z table ide the given table represent the areas under the standard normal curve for values
lative z—score. For example, to determine the area under the curve between 0 and
rsecting cell for the row labeled 2.30 and the column labeled 0.06. The area under the I 0.1064 ‘
. 43 11 ' _ 0.3212 “0.323%; .
03438 034W“? 03485 1519W 1 ‘ 0.4992 104929 0.4993 ...
View
Full Document
 Fall '11
 Gould

Click to edit the document details