This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Hypothesis Testing
Utku Suleymanoglu
UMich Utku Suleymanoglu (UMich) Hypothesis Testing 1 / 37 Introduction The Idea Statistical inference is about arriving at conclusions about unknown population
parameters using data.
Example: we want to know the yearly income of public university graduates in the USA
at the age 30. An unknown mean µ, unknown variance, σ 2 , unknown distribution.
Can’t interview all young people, so we get a sample of 100: x = 55. Let’s say we know
¯
σ2 = 9
x is our best guess for µ. We can also build a conﬁdence interval around it: something
¯
like x ± zα/2 σx .
¯
¯
These are our best answers to the question “what is µ?”.
Our estimate, x and σ 2 , and what we know of the sampling distribution of x , can also
¯
¯
help us evaluate claims about µ.
Suppose somebody claims that µ ≥ 60. Utku Suleymanoglu (UMich) Hypothesis Testing 2 / 37 Introduction We are going to learn how to evaluate the validity of these types of hypotheses regarding
the unknown population parameters (µ, or p .)
¯
We just measured that with 100 observations that x = 55. The spirit of the testing would
¯
be evaluating the chances of this sample being drawn from a population with µ ≥ 60.
Is it possible to have an x = 55 from a distribution with µ = 60? Yes, techically, but
¯
highly unlikely.
We use the sampling the distribution of the sample mean under the assumption that
the claim is right: Then X ∼ N (60, 9/100) = N (60, 0.09).
We know x will change from sample to sample, but how likely it is that it is less than 55,
¯
or 54? Very very very . . . very unlikely.
Then, this claim is not really credible, the data does not support it.
This is the spirit of hypothesis testing. Let’s formalize the idea of “claims.” Utku Suleymanoglu (UMich) Hypothesis Testing 3 / 37 Null and Alternative Hypotheses The Null and Its Alternative We will devise two complementary hypotheses to formalize the idea of a claim: null
hypothesis and alternative hypothesis.
Null Hypothesis: Holds the claim to be challenged, to be refuted if possible.
H0 : µ ≥ 60.
Alternative Hypothesis: Alternative theory to be maintained if the null hypothesis
is rejected: H1 (HA ) : µ < 60.
As a researcher if I think that the average yearly income at 30 for university graduates
(µ) is less than 60 thousand dollars a year, to back up my claim, I challange the opposite
claim, that it is higher than 60. If I reject the H0 , then I get evidence for my theory.
Alternatively, you might also be asked a hypothesis directly. “Test the claim that µ is less
50”. Then this claim should be put in the null hypothesis. Utku Suleymanoglu (UMich) Hypothesis Testing 4 / 37 Null and Alternative Hypotheses Examples
1. You are unhappy with your car: you think your new car travels less than 30 mpg. To
get it serviced for free, you want to ﬁnd evidence that MPG < 30. You record your
consumption for a month and build a sample. You challange the claim that
E (MPG ) = µ ≥ 30
H0 :µ ≥ 30
H1 :µ < 30
2. You are an engineer and have a machine in your factory that produces pencils. You
think the machine is broken, and it produces pencils of incorrect lenghts. It supposed
to produce pencils of an average length of 5 inches. You collect a sample and test:
H0 :µ = 5
H1 :µ = 5
KEY: In all hypothesis testing, you look at the evidence and decide whether you have
enough evidence to reject the null hypothesis. If you don’t have it, you fail to reject the
null hypothesis: you don’t accept the null hypothesis. Jury analogy. . .
Utku Suleymanoglu (UMich) Hypothesis Testing 5 / 37 Null and Alternative Hypotheses Type I and II Errors The hypothesis testing we will do is not perfect: there could be mistakes.
Reality
Testing Result H0 is True H0 is False Reject H0 Type I Error Correct Fail to Reject H0 Correct Type II Error TYPE I Error: Null Hypothesis is true and you reject it. Probability = α. Signiﬁcance
level. We can choose this.
TYPE II Error: Null Hypothesis is false and you fail to reject it. Probability = β .
Power= 1 − β . This depends on the unknown population parameter. Have limited
control over this. Utku Suleymanoglu (UMich) Hypothesis Testing 6 / 37 Null and Alternative Hypotheses Trial Analogy
Reality
Verdict INNOCENT (H0 ) GUILTY Reject (Verdict=Guilty) H0 Type I Error Correct Fail to Reject (Verdict= Not Guilty) H0 Correct Type II Error We set a high standard for convicting people. We assume innocence, then try to ﬁnd
evidence to reject this presumption. We do the same for null hypothesis as well: unless
there is a lot of evidence, we do not reject it.
Type I error: Innocent man gets the chair, Type II error: Murderer walks away. Society
and statisticians try to minimize the probability of Type I error ﬁrst, and demand a lot
evidence to reject an H0 .
Key thing: If we fail to reject H0 , we don’t say “we proved H0 ”, we just don’t have
enough evidence against it. Analogy: if the defendant walks away, his innocence is not
proved, instead his guilt has not been proved with enough evidence.
Utku Suleymanoglu (UMich) Hypothesis Testing 7 / 37 Null and Alternative Hypotheses General Testing Procedure TEST PROCEDURE:
1 Formulate and state null and alternative hypothesis.
2 (Select a signiﬁcance level: α)
3 Calculate a suitable test statistic using available sample statistics to use in
conjuction with. . .
4 (Develop and) Use a decision rule to make a call about H0 .
(a) Assume the null hypothesis is valid.
(b) Figure out the sampling distribution of the sample statistic under the assumption is
null hypothesis correct.
(c) Figure the distribution of the test statistic under the null.
(d) Select a criteria that uses probability distribution of the test statistic to reject or fail to
reject the null hypothesis. The criteria uses α as a tolerance level. 5 State your conclusion on the null hypothesis. Utku Suleymanoglu (UMich) Hypothesis Testing 8 / 37 Testing Hypothesis about the Population Mean: σ known σ known: One tailed tests We start with unrealistic case where σ is known. This works exactly if population values
have a normal distribution, and approximately if not.
OneTailed Tests
A lefttailed test has the H0 and H1 :
H 0 : µ ≥ µ0
H1 :µ < µ0
A righttailed test has the H0 and H1 :
H 0 : µ ≤ µ0
H1 :µ > µ0 Utku Suleymanoglu (UMich) Hypothesis Testing 9 / 37 Testing Hypothesis about the Population Mean: σ known Test statistic for tests with known σ ’s will have the test statistic:
z= x − µ0
¯
√
σ/ n Now, we need to come up with a testing criteria. There are two equivalent ways of doing
this:
pvalue approach
Critical value (rejection region) approach
These are best explained with an example. We will discuss the logic of hypothesis testing
with this example.
Important Note: We will discuss hypothesis testing regarding µ and p in diﬀerent
scenarios. The ﬁrst scenario is for µ where σ is known. I will spend an extra amount of
time on this case to explain to you the logic of hypothesis testing. This logic carries
through everything we are going to do, so I will not repeat at again. Don’t mistake me
spending a lot of time on the ﬁrst case for other cases not being important. Utku Suleymanoglu (UMich) Hypothesis Testing 10 / 37 Testing Hypothesis about the Population Mean: σ known Long Running Example Suppose you think the average lifespan of energysaving light bulbs is less than 3 years.
You collect a sample of 25 newly builty bulbs and measure their lifespan. You get
x = 2.5. You (somehow) know standard deviation of their lifespan is σ = 1.5. Then we
¯
have the hypotheses:
H 0 :µ ≥ 3
H 1 :µ < 3
This is a lefttailed test.
Relevant test statistic for it is:
z= Utku Suleymanoglu (UMich) x − µ0
¯
2 .5 − 3
√=
= −1.66
1.5/5
σ/ n Hypothesis Testing 11 / 37 Testing Hypothesis about the Population Mean: σ known Decision rule will evaluate how likely it is to get a sample with x = 2.5 if you
¯
population mean of 3 years.
.5 2
¯
We know X ∼ N (3, (125) ) if µ = 3 were true.
¯
Then the question is: If so what is the probability of getting an X ≤ 2.5? Well we
can calculate that!
¯
P ( X < 2 .5 ) = P ( Z < 2.5 − 3
) = P (Z < z ) = P (Z < −1.66)
0 .3
z test statistic We can calculate this probability using the ztable. It is 0.0485: the probability that
you get a sample that produces an x which is equal to or lower than than our
¯
current estimate x = 2.5 if the null hypothesis were true.
¯
This is a small probability, so we should probably think that H0 is false. But what is
a small probability?
We need a criteria for small probability. We will call this criteria signiﬁcance level
and denote it with α.
α is in researcher’s control and usually set it to be 0.1, 0.01 or 0.05.
If you calculate a probability for your sample which is less than α, you reject the null
hypotesis.
This is the essence of pvalue approach
Utku Suleymanoglu (UMich) Hypothesis Testing 12 / 37 Testing Hypothesis about the Population Mean: σ known Graphical recap: 0 Utku Suleymanoglu (UMich) Hypothesis Testing z 13 / 37 Testing Hypothesis about the Population Mean: σ known pvalue Approach for OneTailed Tests We are working on the case: tests for µ where σ is known, but this logic generalizes to
many cases.
1. After hypotheses statement and the calculation of test statistic (z for this case) :
2. Calculate
Lefttailed tests: Calculate (lefttail) probability that the sample mean is less than x
¯
at hand if the null is true via: P (Z ≤ z ).
Righttailed tests: Calculate (righttail) probability that the sample mean is more than
x at hand if the null is true via: P (Z ≥ z ).
¯ 3. The probability you calculate is called the pvalue.
4. Compare the pvalue with α.
If pvalue < α : Reject the H0 . You have enough evidence that H0 is false.
If pvalue > α : Fail to reject H0 . There is not enough evidence to reject the null
hypothesis. 5. In our example, we reject the null if α is set to be 0.05 or 0.1 but not if 0.01.
pvalue approach allows easy comparison of decision with diﬀerent α’s.
6. Notice: pvalue is the smallest α choice where H0 is rejected. Utku Suleymanoglu (UMich) Hypothesis Testing 14 / 37 Testing Hypothesis about the Population Mean: σ known Critical Value Approach for OneTailed Tests Another equivalently valid approach to create to criteria for testing would be this:
1. After hypotheses statement and the calculation of test statistic (z for this case).
2. Set an α. Say, α = 0.05.
3. Figure out the critical value zα such that P (Z ≥ zα ) = α. The zvalue with
righttail probability of α.
4. Make a decision about H0 by comparing the test statistic with with the critical value.
Critical value tells us which values are too far oﬀ from the null hypothesis value.
Lefttailed tests: Reject H0 if z < −zα .
Righttailed tests: Reject H0 if z > zα . 5. Choosing an α and ﬁnding the critical value creates a rejection region. If the test
statistic is in this region, H0 is rejected. Utku Suleymanoglu (UMich) Hypothesis Testing 15 / 37 Testing Hypothesis about the Population Mean: σ known Example Cont. For our example, critical value with α = 0.05 is −1.645 = −zα .
Then any test static which is smaller than than −1.645 is in the rejection region.
We had z = −1.66, so we reject the the null hypothesis. 0 Utku Suleymanoglu (UMich) Hypothesis Testing z 16 / 37 Testing Hypothesis about the Population Mean: σ known RightTailed Test Example Suppose now that you also believe that µ > 1.5. So you want to test the claim µ ≤ 1.5.
Let’s do that. First, let’s properly state the hypotheses:
H0 :µ ≤ 1.5
H1 :µ > 1.5
This is a righttailed test. Under the assumption that null hypothesis is true, we need to
evaluate the chances of x ≥ 2.5. If small, the null hypothesis is not likely to be true.
¯
Next step: relevant test statistic is z = x −µ0
¯
√
σ/ n = 2.5−1.5
0.3 = 3.33. Next: Critical value for α = 0.01 is z0.01 = 2.33.
Next: Use the decision rule: z > 2.33 so we reject H0 .
OR: pvalue for z = 3.33 is smaller than 0.001, so we reject the null hypothesis for
reasonable α. Utku Suleymanoglu (UMich) Hypothesis Testing 17 / 37 Testing Hypothesis about the Population Mean: σ known Determinants of Tests Results Before we go on and discuss testing with diﬀerent situations, let’s summarize
determinants of the test results.
The diﬀerence between hypothesized parameter and calculated statistic: x − µ0 .
¯
Generally speaking, if the claim is too far oﬀ from the hypothesized value, test
statistic would be larger in absolute terms. The eﬀect of this on test depend on the
sign of the test statistic and the tail of the test.
Precision of the sampling distribution: Standard error of the sample statistic (x )
¯
determines how much variation we should expect in x from sample to sample. If it is
¯
high, it is more likely to have samples with x that deviates largely from the
¯
hypothesized value. Utku Suleymanoglu (UMich) Hypothesis Testing 18 / 37 Testing Hypothesis about the Population Mean: σ known Universally:
Level of signiﬁcance: α is our choice as researchers. Think about the pvalue
approach. You compare your calculated pvalue with diﬀerent α’s. If p = 0.04, you
reject the null with α = 0.05, but not if α = 0.01. To reject a null with α = 0.01 or
α = 0.001, you need a really small pvalue. So as α decreases, you ask for more and
more evidence against the null hypothesis to be able to reject it.
A small α choice means you have a small probability of rejecting a true hypothesis
(Type I error, executing the innocent). But a small α is also nitpicking about the
evidence and not rejecting H0 most of the time. So maybe you are also not rejecting
some false hypotheses: probability of committing Type II error increases.
Generally speaking an α = 0.05 is norm. If one needs to be more conservative about
rejecting the null, α = 0.01 is the choice. An α = 0.1 is also ok. There is no
clearcut reason to choose one over the others. But we don’t use α = 0.2 or α = 0.8.
The nice thing about providing pvalues is that you allow the readers to pick their
own α’s and arrive at their own conclusions quickly. Utku Suleymanoglu (UMich) Hypothesis Testing 19 / 37 Testing Hypothesis about the Population Mean: σ known Example Remember we have x = 2.5 and σ = 1.5 and n = 25. Suppose now that you have to test
¯
H0 : µ ≥ 1 with the alternative hypothesis H1 : µ < 1.
5
This is a lefttailed test. We can calculate the test statistic: z = 2.0.−1 = 5. That is a
3
pretty big z . (If this was a righttailed test you would reject the null hypothesis.) But this is a lefttailed test. The pvalue is calculated as the left tail probability.
P (Z ≤ 5) ≈ 1. That is bigger than any α imaginable. So you fail to reject the null.
(Critical value is negative for lefttailed tests, so a positive test statistic cannot be in the
rejection region)
Even if the test results seems obvious, we still test it properly. And we still don’t call it
“accept”: P (Z ≤ 5) = 1, it is P (Z ≤ 5) ≈ 1
And this is an example why a higher test statistic does not necessarily mean right away
that you are more likely to reject the null. Generally, a big test statistic in the direction of
the tail of the test is likely to reject the null hypothesis. Utku Suleymanoglu (UMich) Hypothesis Testing 20 / 37 Testing Hypothesis about the Population Mean: σ known σ is known: Two Tailed Tests Now we will discuss a slightly diﬀerent type of test. The diﬀerence is in the null and
alternative hypothesis tests:
H 0 : µ = µ0
H 1 : µ = µ0
These type of tests judge the claim that unknown population parameter is exactly equal
to some number. In economics, twotailed tests are performed a lot to test things like:
Whether a production technology have constant returns to scale. (population
parameter=1)
Whether a job training program has any eﬀect on wages whatsoever. (population
parameter=0)
We will come back to the latter one again when we do regression analysis. Utku Suleymanoglu (UMich) Hypothesis Testing 21 / 37 Testing Hypothesis about the Population Mean: σ known Example
Test procedure is very similar to the onetailed tests with a few but important diﬀerences.
Suppose with your lightbulb sample (remember x = 2.5, n = 25 and σ = 1.5.) Now
¯
suppose that there is a claim that says the mean life expectancy of lightbulbs is 2.6 years:
H0 :µ = 2.6
H1 :µ = 2.6
The test statistic is going to be identical with onetailed tests:
z= x − µ0
¯
2.5 − 2.6
√=
= −0.33
0.3
σ/ n The test statistic calculates the relative position of 2.5 with respect to hypothesized
value for µ: 2.6. You can see it is fairly close as measured by z . Given that normal
¯
distribution is bellshaped, we know x = 2.5 draw from the distribution of X is quite
¯
probable if µ = 2.6, so we should not reject the H0 .
Key thing: Because of the equality in the null, what we consider unlikely if the null
hypothesis is true can be on either tail. We will build our rejection regions on both tails.
Utku Suleymanoglu (UMich) Hypothesis Testing 22 / 37 Testing Hypothesis about the Population Mean: σ known This time let’s start with the Critical Value Approach:
After stating the hypotheses and calculating the zstatistic, the decision rule is going to
be:
x −µ
¯
Reject H0 if  σ/√0  > zα/2
n In other words, reject the null if the test statistic is outside the interval (−zα/2 , zα/2 )
where critical value zα/2 is the zvalue for upper tail probability α/2.
For our example, we have z =
α/2 = 0.025 and zα/2 = 1.96. 2.5−2.6
√
1/ 25 = 0.1/0.3 = −0.33. Let’s pick α = 0.05, then Because z = 0.33 lies inside the interval (−1.96, 1.96), we do not reject the null
hypothesis. We don’t have enough evidence to assert that µ = 2.6 is not the case.
Let’s see what we are doing on a picture. Utku Suleymanoglu (UMich) Hypothesis Testing 23 / 37 Testing Hypothesis about the Population Mean: σ known Graphical Explanation 0 Utku Suleymanoglu (UMich) Hypothesis Testing z 24 / 37 Testing Hypothesis about the Population Mean: σ known pvalue Approach for TwoTailed Tests We can also calculate a pvalue for the test statistic z , to compare with diﬀerent α’s to
get a conclusion.
The pvalue can be calculated as the area outside the interval (−z , z ), or simply (due to
symmetricity) as 2 × P (Z > z = 0.33)
In our example we get a pvalue 2 × P (Z > z = 0.33) = 2 × 0.3707 = 0.7414.
This value is bigger than any reasonable α so reach at the same conclusion as before. We
fail to reject the null hypothesis.
Let’s go back one slide and see this on a picture. Utku Suleymanoglu (UMich) Hypothesis Testing 25 / 37 Testing Hypothesis about the Population Mean: σ known Hypothesis Testing Fundamentals Recap
Before we go on to diﬀerent cases, let’s repeat the general idea of hypothesis testing:
We have an hypothetical value for a population parameter (µ = µ0 ) as a claim and
we want to test this.
We have a sample and a point estimate x = 2, let’s say.
¯
We know the sampling distribution of x assuming the claim is true from chapter 6.
¯
Then we can evaluate the probability of x or a similar draw from this sampling
¯
distribution.
To do that we need to transform our normal random variable to standard normal,
this gives us zstatistic.
Then we can either
calculate the probability associated with the zstatistic and see if it is small or big
(pvalue approach)
compare it to some critical zvalue so that we can assess how far oﬀ it is from the
claimed value. (critical value approach) Either way, based on the assumption the claim is true, we assess the correctness of
the claim by comparing it to what we observe in the data.
If two are “diﬀerent enough”, we say the claim is (probably) not correct.
Utku Suleymanoglu (UMich) Hypothesis Testing 26 / 37 σ not Known Case 2: σ not known, population normal When σ is not known, we can use s , sample standard deviation, instead. Just like we did
before. . . for CI’s.
But remember, we need a modiﬁcation to make this work. We need to use tdistribution
instead of standard normal distribution.
pvalue approach is hard to perform with tdistribution, so we will ...
View
Full Document
 Spring '08
 STAFF
 Normal Distribution, Null hypothesis, Hypothesis testing, Statistical hypothesis testing, Utku Suleymanoglu

Click to edit the document details