This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Week nine Sessions one and two Lecture notes (Chapter 21)
ProfessorEsfandiari The obj ective of this lecture is to... ° Revisit P value as a conditional probability ' Introduce you to the concept of the level of signiﬁcance and its role in hypothesis
testing ' Show you how to determine critical values based on the level of signiﬁcance and
the function of critical values in hypothesis testing. ° Deﬁne and interpret type I error or alpha and how it relates to hypothesis testing ° Deﬁne and interpret type 11 error or beta and how it relates to hypothesis testing ' Deﬁne and interpret powers of the test and how it relates to hypothesis testing ° Deﬁne and interpret effect size  Discus the difference between statistical and practical signiﬁcance Revisiting P value as a conditional probability Remember conditional probability from prior chapters. We deﬁned P (A  B) as the
conditional probability of event A; given that B has happened. The P value is actually a conditional probability and the condition for it is the null
hypothesis being true. We will deﬁne P as Pvalue = Probability (of an observed statistic such as pA or X resulting from the sample data or even more extreme values than pA or
X given that that null hypothesis is true) What do small P values mean? They show the sample data are like rare events and
highly unlikely events, In such cases we use the data from the sample as evidence
against the null hypothesis. What do large P values mean? Large P values show that the sample data are not
surprising . It does not provide us with any evidence against the null hypothesis. The
results from the sample are aligned with the null hypothesis and thus'there is no
reason to reject it. ' 3 ' What is signiﬁcance level? In order to decide what is a “rare event”, we need to set a threshold for the P value.
That threshold is called the “level of signiﬁcance” and we show it with alpha (or). Level of signiﬁcance is generally set at 0.05 , 0.01, and 0.001. This depends on the nature
of the study. If the study is done in controlled condition, then you go for a lower level of signiﬁcance. For instance, if you are talking about sending a satellite to the space, you
want a much lower alpha. However, if you are working on a drug that is being tested for
the ﬁrst time on the treatment of HIV, you go for a higher level of alpha and accept a
higher risk of rejecting a true null hypothesis. ' If the P level is less than alpha, then you have found sufﬁcient evidence from
the data to reject the null. ' If the p value is larger than alpha, then you have insufﬁcient evidence from
the data to reject the null hypothesis and we “fail to reject the null”. Another way of looking at alpha and P value ' Alpha is the risk you are willing to take to reject a true null and it is referred
to as “nominal” type one error. ° P value is the risk you will actually take to reject a true null and it is referred
to as “actual” type one error. ° If the actual risk of rejecting a true null (P value) is less than the nominal risk you
are willing to take to reject the true null (alpha), then you will reject the null. ' If the actual risk of rejecting a true null (Pvalue) is larger than the nominal risk
you are willing to take to reject the true null (alpha), then you will fail to reject the
null. Testing the null at a ﬁxed level of alpha and with reference to
critical values (the traditional way) In prior lecture on conﬁdence interval we talked about 90%, 95%, and 99% conﬁdence
interval and mentioned that since based on CLT distribution of pA for independent
random samples of size 30 or more follows the normal distribution, we can use the Z
values corresponding to the 90%, 95%, and 99% intervals to compute the conﬁdence
interval for P. In the previous lecture, we deﬁned conﬁdence interval for the population proportion and
showed that the corresponding Z values or critical for the 90%, 95%, and 99% were 1.65, 1.98, and 2.57. The alpha level is the complement of the conﬁdence interval and
thus the corresponding alpha levels of signiﬁcance for 90%, 95%, and 99% conﬁdence
interval are 0.10, 0.05, and 0.01 respectively. The areas beyond these critical values are called critical regions. In testing the null hypothesis for the proportion, if the Z value that we compute through
(pAP)/SE(P) falls in the critical region we will reject the null hypothesis. Whereas, if
the Z value we ﬁnd falls within the conﬁdence area, we fail to reject the null. Example: Jerry wonders (twotailed) if a coin is fair. He tosses it one hundred times
and gets 65 heads. Given this information answer the following questions: . a) State the null and the alternative hypothesis in words. H0: P = 0.50
If a coin is fair one expects the proportion of heads to be 50%. Ha: P # 0.50
We do not know if the coin is fair or not so that the proportion of heads
could be more or less than 50% in the log run. a) Test the null hypothesis by the calculation of the P value.
Z = (13"  P)/SD (19“) P" = 65/100 = 0.65
SD(pA) = 0050* 0.50/10 = 0050* 0.50/10 = 0.05 Z = (0.65 — 0.50)/0.05, Z = +3
Pvalue = Probability {(Z > = +3) +probability of (Z < = —3)} P value = (0.0013) + (0.0013) = 0.0026. Thus, the actual risk of rejecting
a true null is 0.26%. Since 0.0026 is very low, we reject the null.
c) Interpret the results within context Since the P value is very low, we conclude that the coin is not fair and it is biased so
that heads are more likely than tails. (1) Suppose that Jared has a coin and once he tosses it he gets 60 tails. He thinks the coin
is biased toward giving more tails. Write the null and the alternative hypothesis in
symbols and test the null at alpha equal to 0.05. H0: P = 0.50, Ha: P > 0.50 P" = 60/100 = 0.60 Z (1)“) = (13"  P)/ SDG’) z (pA) = 0.69 — 0.50/0.05 = 010/005 = 10/5 = 2 Type I and type 11 error, power of the test
As it was illustrated before hypothesis testing is sort of a conditional probability. The following table helps to illustrate this: Table one: Deﬁnition of Type I and [I error and power of the test and guidelines for
deciding to reject or fail to reject the null r—m The Truth H0 is true H0 is false Given that H0 is true Given that H0 is false, Decision: Rejecting it is
NOT OK Rejecting it is OK Reject Ho Power (1—6) Type I error (a) Decision: Given H0 is false, failing to reject
it is Given that H0 is true, Failing to reject it is Fail to reject H0 NOT OK OK Type [I error (B) Table two: Type I and [I error and power of the test applied to medicine j The Truth 
Does not have the disease Has the disease
Decision: Diagnosis: patient has the Diagnosis: Patient has the disease
disease
. 0K
NOT OK
Reject Ho False Positive
Power (16)
Type I error (on)
Decision: Diagnosis: The patient Diagnosis: Patient does not have
does not have the disease the disease
Fail to reject H0 NOT OK
False negative
OK
Type II error (ﬂ) Table three: Type I and I] error and power of the test applied to nutrition The Truth
Diet program does not
work Diet program works Decision: Join the program Join the program
NOT OK OK
Reject Ho Type I error (a) Power (1[3)
Decision: Do not join the program Do not join the program NOT OK
Fail to reject H0 OK' Type II error ([3) Table four: Type I and 11 error and power of the test applied to teaching Online quizzes improve learning
stats The Truth
Online quizzes do not
imrove learnin_ stats Decision: Include online quizzes in Include online quizzes in stats
stats curriculum curriculum
NOT OK OK Reject Ho Power (l—B) Do not include online quizzes in
stats curriculum Type I error (at) Do not include online
quizzes in stats curriculum Decision: Fail to reject H0 OK NOT OK Type II error ([5) Since it is a onetailed test, we need to place all of the alpha on the right
side of the curve. The corresponding Z value is = 1.65. Since the
calculated Z value is larger than the critical Z value (2 > 1.65), we reject
the null and conclude that the coin is biased toward giving more tails. e) Now suppose that Jared wanted to test HQ: P = 0.50 vs. H(a) P # 0.50 at alpha 2 0.01.
The results would change as follows. Since the alternative hypothesis is twotailed, we need to divide alpha in
two parts and the corresponding critical values are +/— 2.57. Since the
calculated Z value is less than the critical values (+ 2 < 2.57), we fail to reject the null at the level of 0.01 and we conclude that Jared’s coin is {Left +0 flaw He 5~ ‘jlvcrvm —Z=§8‘l‘v T16? Comparison of testing the null hypothesis through calculation of the P value and testing the null at a ﬁxed
level of alpha In the old days when the computers were not around, they tested the null hypothesis at
ﬁxed levels of alpha generally 0.05, 0.01 and 0.001. In today’s world, since computer
software carry the statistical analysis, P—Value is readily calculated. Also, P value gives us the actual probability of rejecting a true null. For instance, the P
value of 0.35 means we have a 35% risk of rejecting a true null, and since it is too high of
a risk to take, we do not reject the null. However, if we simply say we fail to reject the null at alpha equal to 0.05 and do not report the P—value, the risk of rejecting the true null
could have been as low as 0.06 and as high as 0.99. Suppose the Null
Hypothesis is true. Suppose the Null
Hypothesis is not true. Fail lo reject Ho 3 [3*
Figure 1: Illustration of Type I and Type II Errors and power of the curve P0 shows the hypothesized value of proportion under the null. The curve with mean
equal to P0 displays the pA distribution given that the null is true. P shows the hypothesized value of proportion under the alternative. The curve with
mean equal to P displays the distribution of 1)" under the alternative hypothesis. Effect size: Effect size is defined as the difference between the mean
under H (A) and the mean under H0. Effect size makes more sense when we
talk about a treatment. Suppose we say that after participation in a weight loss program,
participants are expected to lose 10 pounds. That means if they do nothing (null) they
lose zero pounds; it under the null equals to zero. If they participate in the program they lose ten pounds; pLunder the alternative equals to ten. The difference of the two means or
ten pounds is called the effect size. (see graph 2 on page 8: effect size = 2). The area of 0. shows a one—tailed test of the null hypothesis or the nominal risk of
rejecting a true null. If the alternative hypothesis were two tailed, we would divide
this area in half. The area to the left of on is the area that we are assuming that H0 is true and we are
not going to reject H0. B displays type H error of not rejecting a null hypothesis that is false (NOT OK).
Notice that B is the area that the null and the alternative curve overlap. If we did
not have any type 11 error, there would be no overlap. The area that is under the alternative curve and is the complement of B or (1  B)
displays the power rejecting a null that is not true (OK). Criticai Line 2: 1,04% Figure two: Displaying type I and II, power and effect size = 2 The difference between statistical and practical signiﬁcance As a consumer of statistics, you should differentiate between statistical and practical
signiﬁcance. As we discussed before, the standard error is inversely proportional to the
size of the sample. As sample size increases, standard error decreases, Z value increases,
and P—Value decreases. But, a small Pvalue does not necessarily imply
practical significance. Example: Let us say that the proportion of American public who endorse spending more
money on mental health of the homeless is 55%. Based on a study, state X is being
criticized for not endorsing the support of the mental health of the homeless. The relevant data are given below H0: P = 0.60 Ha: P < 0.60 P“ = 0.57 N = 2000 (random sample picked from state X) Z (PA) ”7 (PA  P)/SD (13“)
z (pA) = (0.57 —— 0.60)/ 0.011
Z(p") = (—0.03)/0.011 = 2.72 P value = {Prob (Z > = 2.72) + prob (Z < = 2.72)} =
0.0033*2 = 0.0066 Since the P value is very low, we reject the null and conclude that the proportion of
people who endorse endorsing the mental health of the homeless in state X is less than
the proportion in American public. But, the question is do we consider the difference
between 57% and 60% to be statistically meaningful! An example from my own research: After Columbine shooting in the State of
Colorado, the US Office of Education provided funding for designing an intervention that
would help to improve the attitude of students toward police and authority. This study
was conducted in six states on a total of 30,000 sixth grade students. The control group
was exposed to the social studies book and the experimental group was exposed to the
social studies book plus training in “law related education”. The students in the control
and the experimental groups were tested on their attitude toward law and authority before
and after the programthat lasted a whole academic year. There were 2000 students in the
experimental and 1000 students in the control group (we did not collect data on all 30,000
students). H0: (u improvement in attitude for experimental) —
(u improvement in attitude control) = 0 H0: (u improvement in attitude for experimental) —
(u improvement in attitude control) > 0 Findings indicated that there was a statistically significant difference between the attitude
of the experimental and the control group towards police and authority with the
experimental group scoring improving 3% more than the control group (P = 0.000).
However, when the findings were presented to the Senate, they were not willing to fund
the project again because they felt that the 3% difference was not that important or
meaningful. This implies that in spite of the fact that the results were
statistically significant with very low Pvalue, they did not find this
difference to be practically significant. This shows that statistical
significant does not guarantee practical significance. If we use a large
enough sample we can make any small difference look statistically
significant. 10 Problems solved in lecture from chapter 21 2. Which alternative?
In each of the following cases, is the alternative hypothesis one sided or twosided? What
are the hypotheses? a) A college dining service wants to know if students prefer plastic to metal cutlery.
They conduct a survey to ﬁnd out. b) In recent years, 10% of the college juniors have applied for study abroad. The
dean’s office conducts a survey to see if that changed this year. c) A pharmaceutical company conducts a clinical trial to se if more patients who
take a new drug experience headache relief than the 22% who claimed relief after
taking the placebo. d) At a small computer peripheral company, only 60% of the hard drives produced
passed all their performance tests the ﬁrst time. Management recently invested a
lot of resources into the production system and now conducts a test to see if it
helped. 2. Which alternative? 21) Two sided. Let p be the percentage of students who prefer plastic. H0: 50% of students prefer plastic. (p = 0.50) HA : The percentage of students who prefer plastic is not 50%. (p 75 0.50) b) Two sided. Let p be the percentage of juniors planning to study abroad. Ho: 10% of juniors plan to study abroad. (p = 0.10) HA: The percentage of juniors plan to study abroad is not 10%. (p i 0.10) c) One sided. Let p be the percentage of people who experience relief. H0: 22% of people experience headache relief with the drug. (p = 0.22) HA: More than 22% of people experience headache relief with the drug. (p > 0.22)
d) One sided. Let p be the percentage of hard drives that pass all performance tests.
Ho: 60% of hard drives pass all performance tests. (p = 0.60) HA: The percentage of drives that pass all performance tests is greater than 60%. (p >
0.60) Problem 8: Signiﬁcant again (problem)
A new reading program may reduce the number of elementary school students who read
below grade level. The company that developed this program supplied materials and
teacher training for a large scale test involving nearly 8500 children in several school
districts. Statistical analysis of the results showed that the percentage of students who did
not attain the grade—level standard was reduced from 15.9% to 15.1%. The hypothesis
that the new reading program produced no improvement was rejected with a p value of
0.023. a) Explain what the pValue means in context. b) Even though this reading method has been shown to be signiﬁcantly better, why might you not recommend that your local school adapt it? 11 8. Signiﬁcant again (answers) a) If 15.9% is the true percentage of children who did not attain the grade level standard,
there is only a 2.3% chance of observing 15.1% of children (in a sample of 8500) not
attaining grade level by natural sampling variation alone. b) Under old methods, 1352 students would not be expected to read at grade level. With
the new program, 1284 Would not be expected to read at grade level. This is only a
decrease of 68 students. The costs of switching to the new program might outweigh the
potential beneﬁt. It is also important to realize that this is only a potential beneﬁt. 13. Loan (problem) Before lending someone money, banks must decide whether they believe the applicant
will repay the loan. Once strategy used in a point system. One ofﬁcer’s assess
information about the applicant totaling points they award for the person’s income level,
credit history, current death burden, and so on. The higher the point total, the more
convinced the bank is that it is safe to make the loan. Any applicant with a lower point
total than a certain cut—off score is denied the loan. We can think of this decision as a hypothesis test. Since the bank makes it proﬁt from the
interest collected on repaid loans, their null hypothesis is that the applicant will repay the
loan and therefore, should get the money. Only if the person’s score falls below the
minimum cutoff, will the bank reject the null and deny the loan. This system is
reasonably reliable, but, of course, sometimes there are mistakes. a) When a person defaults on a loan, which type of error did the bank make? b) Which kind of error is it when the bank misses and opportunity to make a loan to
someone who will have repaid it? 0) Suppose the bank decides to lower the cutoff score from 250 points to 200. Is that
analogous to choosing a higher or lower value of alpha for a hypothesis test?
Explain. d) What impact does this change in cutoff value have on the chance of each type of
error? 13.Loans (answer). a) The bank has made a Type 11 error. The person was not a good credit risk, and the
bank failed to notice this. b) The bank has made a Type I error. The person was a good credit risk, and the bank
was convinced that he/she was not. c) By making it easier to get a loan, the bank has reduced the alpha level. It takes less
evidence to grant the person the loan. d) The risk of Type I error is decreased and the risk of Type 11 error has increased. 12 17. Homeowners 2005 (problem).
In 2005 the US. Census Bureau reported that 68.9% of American families owned their
homes. Census data reveal that the ownership rate in one small city is much lower. The
city council is debating a plan to offer tax breaks to ﬁrsttime home buyers in order to
encourage people to become homeowners. They decide to adopt the plan on a 2year trial
basis and use the date they collect to make a decision about the continuing tax breaks.
Since this plan costs the city tax revenues, they will continue to use it only if there is
strong evidence that the rate of home ownership is increasing. a) In words, what will their hypothesis be ? b) What would a type I error be? c) What would a type II error be? d) For each type of error, ell who would be harmed. e) What would be the power of the test represent in this context? 17. Homeowners 2005 (answer). a) ...
View
Full Document
 Fall '11
 Gould

Click to edit the document details