This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Hypothesis Testing
Utku Suleymanoglu
UMich Utku Suleymanoglu (UMich) Hypothesis Testing 1 / 39 Introduction The Idea Statistical inference is about arriving at conclusions about unknown population parameters using
data.
Example: we want to know the yearly income of public university graduates in the USA at the
age 30. An unknown mean µ, unknown variance, σ 2 , unknown distribution.
Can’t interview all young people, so we get a sample of 100: x = 55. Let’s say we know σ 2 = 9
¯
x is our best guess for µ. We can also build a conﬁdence interval around it: something like
¯
x ± zα/2 σx .
¯
¯
These are our best answers to the question “what is µ?”.
Our estimate, x and σ 2 , and what we know of the sampling distribution of x , can also help us
¯
¯
evaluate claims about µ.
Suppose somebody claims that µ ≥ 60. Utku Suleymanoglu (UMich) Hypothesis Testing 2 / 39 Introduction We are going to learn how to evaluate the validity of these types of hypotheses regarding the
unknown population parameters (µ, or p .)
¯
We just measured that with 100 observations that x = 55. The spirit of the testing would be
¯
evaluating the chances of this sample being drawn from a population with µ ≥ 60.
Is it possible to have an x = 55 from a distribution with µ = 60? Yes, techically, but highly
¯
unlikely.
We use the sampling the distribution of the sample mean under the assumption that the claim is
right: Then X ∼ N (60, 9/100) = N (60, 0.09).
We know x will change from sample to sample, but how likely it is that it is less than 55, or 54?
¯
Very very very . . . very unlikely.
Then, this claim is not really credible, the data does not support it.
This is the spirit of hypothesis testing. Let’s formalize the idea of “claims.” Utku Suleymanoglu (UMich) Hypothesis Testing 3 / 39 Null and Alternative Hypotheses The Null and Its Alternative We will devise two complementary hypotheses to formalize the idea of a claim: null hypothesis
and alternative hypothesis.
Null Hypothesis: Holds the claim to be challenged, to be refuted if possible. H0 : µ ≥ 60.
Alternative Hypothesis: Alternative theory to be maintained if the null hypothesis is
rejected: H1 (HA ) : µ < 60.
As a researcher if I think that the average yearly income at 30 for university graduates (µ) is less
than 60 thousand dollars a year, to back up my claim, I challange the opposite claim, that it is
higher than 60. If I reject the H0 , then I get evidence for my theory.
Alternatively, you might also be asked a hypothesis directly. “Test the claim that µ is less 50”.
Then this claim should be put in the null hypothesis. Utku Suleymanoglu (UMich) Hypothesis Testing 4 / 39 Null and Alternative Hypotheses Examples 1. You are unhappy with your car: you think your new car travels less than 30 mpg. To get it
serviced for free, you want to ﬁnd evidence that MPG < 30. You record your consumption
for a month and build a sample. You challange the claim that E (MPG ) = µ ≥ 30
H0 :µ ≥ 30
H1 :µ < 30
2. You are an engineer and have a machine in your factory that produces pencils. You think the
machine is broken, and it produces pencils of incorrect lenghts. It supposed to produce
pencils of an average length of 5 inches. You collect a sample and test:
H0 :µ = 5
H 1 :µ = 5
KEY: In all hypothesis testing, you look at the evidence and decide whether you have enough
evidence to reject the null hypothesis. If you don’t have it, you fail to reject the null hypothesis:
you don’t accept the null hypothesis. Jury analogy. . . Utku Suleymanoglu (UMich) Hypothesis Testing 5 / 39 Null and Alternative Hypotheses Type I and II Errors The hypothesis testing we will do is not perfect: there could be mistakes.
Reality
Testing Result H0 is True H0 is False Reject H0 Type I Error Correct Fail to Reject H0 Correct Type II Error TYPE I Error: Null Hypothesis is true and you reject it. Probability = α. Signiﬁcance level. We
can choose this.
TYPE II Error: Null Hypothesis is false and you fail to reject it. Probability = β . Power= 1 − β .
This depends on the unknown population parameter. Have limited control over this. Utku Suleymanoglu (UMich) Hypothesis Testing 6 / 39 Null and Alternative Hypotheses Trial Analogy Reality
Verdict INNOCENT (H0 ) GUILTY Reject (Verdict=Guilty) H0 Type I Error Correct Fail to Reject (Verdict= Not Guilty) H0 Correct Type II Error We set a high standard for convicting people. We assume innocence, then try to ﬁnd evidence to
reject this presumption. We do the same for null hypothesis as well: unless there is a lot of
evidence, we do not reject it.
Type I error: Innocent man gets the chair, Type II error: Murderer walks away. Society and
statisticians try to minimize the probability of Type I error ﬁrst, and demand a lot evidence to
reject an H0 .
Key thing: If we fail to reject H0 , we don’t say “we proved H0 ”, we just don’t have enough
evidence against it. Analogy: if the defendant walks away, his innocence has not been proven,
instead: his guilt has not been proven with enough evidence. Utku Suleymanoglu (UMich) Hypothesis Testing 7 / 39 Null and Alternative Hypotheses General Testing Procedure TEST PROCEDURE:
1 Formulate and state null and alternative hypothesis.
2 (Select a signiﬁcance level: α)
3 Calculate a suitable test statistic using available sample statistics to use in conjuction with. . .
4 (Develop and) Use a decision rule to make a call about H0 . You don’t need to develop it
everytime but this is how it works:
(a) Assume the null hypothesis is valid.
(b) Figure out the sampling distribution of the sample statistic under the assumption is null hypothesis
correct.
(c) Figure the distribution of the test statistic under the null.
(d) Select a criteria that uses probability distribution of the test statistic to reject or fail to reject the
null hypothesis. The criteria uses α as a tolerance level. 5 State your conclusion on the null hypothesis. Utku Suleymanoglu (UMich) Hypothesis Testing 8 / 39 Testing Hypothesis about the Population Mean: Case 1 σ known Case 1: σ known We start with unrealistic case where σ is known. This works exactly if population values have a
normal distribution, and approximately if not.
OneTailed Tests
A lefttailed test has the H0 and H1 :
H0 :µ ≥ µ0
H1 :µ < µ0
A righttailed test has the H0 and H1 :
H0 :µ ≤ µ0
H 1 :µ > µ 0 Utku Suleymanoglu (UMich) Hypothesis Testing 9 / 39 Testing Hypothesis about the Population Mean: Case 1 σ known Test statistic for tests with known σ ’s will have the test statistic:
z= x − µ0
¯
√
σ/ n Now, we need to come up with a testing criteria. There are two equivalent ways of doing this:
pvalue approach
Critical value (rejection region) approach
These are best explained with an example. We will discuss the logic of hypothesis testing with
this example.
Important Note: We will discuss hypothesis testing regarding µ and p in diﬀerent scenarios. The
ﬁrst scenario is for µ where σ is known. I will spend an extra amount of time on this case to
explain to you the logic of hypothesis testing. This logic carries through everything we are going
to do, so I will not repeat it again. Don’t mistake me spending a lot of time on the ﬁrst case for
other cases not being important. Utku Suleymanoglu (UMich) Hypothesis Testing 10 / 39 Testing Hypothesis about the Population Mean: Case 1 σ known Long Running Example Suppose you think the average lifespan of energysaving light bulbs is less than 3 years. You
collect a sample of 25 newly builty bulbs and measure their lifespan. You get x = 2.5. You
¯
(somehow) know standard deviation of their lifespan is σ = 1.5. Then we have the hypotheses:
H0 :µ ≥ 3
H 1 :µ < 3
This is a lefttailed test.
Relevant test statistic for this test (for all Case 1 cases, right or lefttailed or twotailed) is:
z= x − µ0
¯
2.5 − 3
= −1.66
√=
σ/ n
1.5/5 We will see why we use this. Utku Suleymanoglu (UMich) Hypothesis Testing 11 / 39 Testing Hypothesis about the Population Mean: Case 1 σ known Decision rule will evaluate how likely it is to get a sample with x = 2.5 if you population
¯
mean of 3 years.
¯
We know X ∼ N (3, (1.5)2
)
25 if µ = 3 were true. ¯
Then the question is: If so what is the probability of getting an X < 2.5? Well we can
calculate that!
2.5 − 3
0.3 ¯
P (X < 2.5) = P (Z < ) = P (Z < z ) = P (Z < −1.66) z test statistic
We can calculate this probability using the ztable. It is 0.0485: the probability that you get
a sample that produces an x which is lower than than our current estimate x = 2.5 if the
¯
¯
null hypothesis were true.
We will call this probability pvalue.
This is a small probability, H0 should probably be rejected. But what is small enough?
We need a criteria. We will set “a small enough probability”: signiﬁcance level and denote it
with α.
If you calculate a pvalue which is less than α, you reject the null hypotesis.
α is in our control: usually 0.1, 0.01 or 0.05. Remember α is also the probability of Type I
error: we might be wrong! α is the probability associated the risk we are taking.
This is the essence of pvalue approach
Utku Suleymanoglu (UMich) Hypothesis Testing 12 / 39 Testing Hypothesis about the Population Mean: Case 1 σ known Graphical recap: 0 Utku Suleymanoglu (UMich) Hypothesis Testing z 13 / 39 Testing Hypothesis about the Population Mean: Case 1 σ known pvalue Approach for OneTailed Tests We are working on the case: tests for µ where σ is known, but this logic generalizes to many
cases.
1. After hypotheses statement and the calculation of test statistic (z for this case) :
2. Calculate
Lefttailed tests: Calculate (lefttail) probability that the sample mean is less than x at hand if the
¯
null is true via: P (Z ≤ z ).
Righttailed tests: Calculate (righttail) probability that the sample mean is more than x at hand if
¯
the null is true via: P (Z ≥ z ). 3. The probability you calculate is called the pvalue.
4. Decision Rule: Compare the pvalue with α.
If pvalue < α : Reject the H0 . You have enough evidence that H0 is false.
If pvalue > α : Fail to reject H0 . There is not enough evidence to reject the null hypothesis. In our example, we reject the null if α is set to be 0.05 or 0.1 but not if 0.01. pvalue approach
allows easy comparison of decision with diﬀerent α’s. Notice: pvalue is the smallest α choice
where H0 is rejected. Utku Suleymanoglu (UMich) Hypothesis Testing 14 / 39 Testing Hypothesis about the Population Mean: Case 1 σ known Critical Value Approach for OneTailed Tests Another equivalently valid approach to create to criteria for testing would be this:
1. After hypotheses statement and the calculation of test statistic (z for this case).
2. Set an α. Say, α = 0.05.
3. Figure out the zα such that P (Z ≥ zα ) = α: The zvalue with uppertail probability of α.
4. Make a decision about H0 by comparing the test statistic with a critical value. Critical value
tells us which values are too far oﬀ from the null hypothesis value.
Lefttailed tests: Reject H0 if z < −zα . Critical value= −zα
Righttailed tests: Reject H0 if z > zα . Critical value= zα 5. Choosing an α and ﬁnding the critical value creates a rejection region. If the test statistic is
in this region, H0 is rejected.
The key idea: Instead of comparing probability with a small enough probability (α), choose α
ﬁrst and establish a far enough test statistic. Utku Suleymanoglu (UMich) Hypothesis Testing 15 / 39 Testing Hypothesis about the Population Mean: Case 1 σ known Example Cont. For our example, critical value with α = 0.05 is −1.645 = −zα .
Then any test static which is smaller than than −1.645 is in the rejection region.
We had z = −1.66, so we reject the the null hypothesis. 0 z Exercise: Calculate the largest x value which will lead to the rejection of the null.
¯ Utku Suleymanoglu (UMich) Hypothesis Testing 16 / 39 Testing Hypothesis about the Population Mean: Case 1 σ known RightTailed Test Example Suppose now that you also believe that µ > 1.5. So you want to test the claim µ ≤ 1.5. Let’s do
that. First, let’s properly state the hypotheses:
H0 :µ ≤ 1.5
H1 :µ > 1.5
This is a righttailed test. Under the assumption that null hypothesis is true, we need to evaluate
the chances of x > 2.5. If small, the null hypothesis is not likely to be true.
¯
Next step: relevant test statistic is z = x −µ0
¯
√
σ/ n = 2.5−1.5
0.3 = 3.33. Next: Critical value for α = 0.01 is z0.01 = 2.33.
Next: Use the decision rule: z > 2.33 so we reject H0 .
OR: pvalue for z = 3.33 is smaller than 0.001, so we reject the null hypothesis for reasonable α.
Exercise: Calculate the smallest x value that leads to rejection of the null.
¯ Utku Suleymanoglu (UMich) Hypothesis Testing 17 / 39 What determines test results? Determinants of The Test Results Before we go on and discuss twotailed testing and other cases, let’s talk about a few things:
Diﬀerence between hypothesized parameter and calculated statistic: x − µ0 . Generally
¯
speaking, if the claim is too far oﬀ from the hypothesized value, test statistic would be larger in
absolute terms. The eﬀect of this on test depend on the sign of the test statistic and the tail of
the test.
Here is an example: Remember we have x = 2.5 and σ = 1.5 and n = 25. Suppose now that you
¯
have to test H0 : µ ≥ 1 with the alternative hypothesis H1 : µ < 1.
We can calculate the test statistic: z = 2.5−1
0.3 = 5. That is a pretty big z . But this is a lefttailed test. The pvalue is calculated as the left tail probability. P (Z ≤ 5). It is
almost 1, it is bigger than any α imaginable. So you fail to reject the null.
Rejection region is (∞, −1.645) (if α = 0.1). Critical value is negative for lefttailed tests, so a
positive test statistic cannot be in the rejection region.
Even if the test results seems obvious, we still test it properly. And this is an example why a
higher test statistic does not necessarily mean right away that you are more likely to reject the
null. Generally, a large (in absolute value) test statistic in the direction of the tail of the test is
more likely to reject the null hypothesis. Utku Suleymanoglu (UMich) Hypothesis Testing 18 / 39 What determines test results? Why Left/Right Tail? Why do we focus on the lefttail if the null hypothesis is µ ≥ µ0 ?.
When testing the null H0 : µ ≥ 1: if x values that are larger than 1 are surely supporting the null.
¯
We saw this in the previous example.
Got x = 100? It is ﬁne, because the null is µ ≥ 1, they are in agreement, so we should not reject
¯
the null.
We focus on the lefttail, because values of x less than 1 is still possible. Notice when
¯
x < 1 = µ0 ⇒ z < 0. As x gets smaller, they also get more and more imporabable.
¯
¯
If µ = 1, x = 0.9 is possible, so is x = 0.1. As we move to lower x ’s, the probability shrinks. We
¯
¯
¯
can calculate this probability, and we set a criteria for a low probability.
This is a criteria for: x is too small to have come from a µ which is 1 or larger.
¯
Notice: If an x value is too small if µ = 1, it will be even less improbable if µ were larger.
¯
This is why we pay attention to the left tail.
Exercise: For H0 : µ ≥ 1, and SE (¯ ) = 0.3, α = 0.05, ﬁnd the largest x where we reject H0 .
x
¯
Exercise: To make sure you understand this, do the righttailed version yourself. Utku Suleymanoglu (UMich) Hypothesis Testing 19 / 39 What determines test results? Precision of the sampling distribution: Standard error of the sample statistic determines how
much variation we should expect in x from sample to sample. If it is high, it is more likely to have
¯
samples with x that deviates largely from the hypothesized value.
¯
Consider the formula for test statistic:
z= x − µ0
¯
σ
√
n = x − µ0
¯
SE (¯ )
x Remember from Chapter 7: as the standard error of the mean (denominator above) decreases, we
say we measure x more precisely. The same x − µ0 diﬀerence might be too little and too large,
¯
¯
depending on how precisely we measure x .
¯
As SE decreases (maybe we have a bigger sample), test statistic will increase magnitude. The
eﬀect of this is generally an increase in the chances of a null being rejected. Here is an example:
Suppose we have a lefttailed test. Test statistic is z = −1, so the null is not rejected with
α = 0.05. If, for some reason, SE(x ) were half of what it was before, the new would have been
¯
z = −2 and the null would be rejected.
Intuition: when we have more precision, we need much less discrepancy between x and µ0 to
¯
reject the null.
Notice: With a lefttailed test, if z > 0, a decrease in the SE cannot change the result. If z > 0,
a lefttailed test will always result in “fail to reject”, regardless of the size of z . Conﬁrm this for
exercise.
Think about the righttailed as well.
Utku Suleymanoglu (UMich) Hypothesis Testing 20 / 39 What determines test results? Signifance level:
α is our choice as researchers. Think about the pvalue approach. You compare your
calculated pvalue with diﬀerent α’s. If p = 0.04, you reject the null with α = 0.05, but not
if α = 0.01. To reject a null with α = 0.01 or α = 0.001, you need a really small pvalue. So
as α decreases, you ask for more and more evidence against the null hypothesis to be able to
reject it.
As α decreases, rejection region gets smaller.
A small α choice means you have a small probability of rejecting a true hypothesis (Type I
error, executing the innocent). But a small α is also asking a lot of evidence and not
rejecting H0 most of the time. So maybe you are also not rejecting some false hypotheses:
probability of committing Type II error increases. So there is a tradeoﬀ between Type I and
Type II error probabilities. This is why we don’t set α = 0.000001.
Generally speaking an α = 0.05 is norm. If one needs to be more conservative about
rejecting the null (asking for a little bit more evidence), α = 0.01 is the choice. An α = 0.1
is also ok. There is no clearcut reason to choose one over the others. But we usually don’t
use α = 0.2. And nevery a high α like α = 0.8.
The nice thing about providing pvalues is that you allow the readers to pick their own α’s
and arrive at their own conclusions quickly. Utku Suleymanoglu (UMich) Hypothesis Testing 21 / 39 What determines test results? σ is known: Two Tailed Tests Now we will discuss a slightly diﬀerent type of test. The diﬀerence is in the null and alternative
hypothesis tests:
H0 :µ = µ0
H1 :µ = µ0
These type of tests judge the claim that unknown population parameter is exactly equal to some
number. In economics, twotailed tests are performed a lot to test things like:
Whether a production technology have constant returns to scale. (population parameter=1)
Whether a job training program has any eﬀect on wages whatsoever. (population
parameter=0)
We will come back to the latter one again when we do regression analysis. Utku Suleymanoglu (UMich) Hypothesis Testing 22 / 39 What determines test results? Example
Test procedure is very similar to the onetailed tests with a few but important diﬀerences.
Suppose with your lightbulb sample (remember x = 2.5, n = 25 and σ = 1.5.) Now suppose that
¯
there is a claim that says the mean life expectancy of lightbulbs is 2.6 years:
H0 :µ = 2.6
H1 :µ = 2.6
The test statistic is going to be identical with onetailed tests:
z= 2.5 − 2.6
x − µ0
¯
= −0.33
√=
σ/ n
0.3 The test statistic calculates the relative position of 2.5 with respect to hypothesized value for µ:
2.6. You can see it is fairly close as measured by z = −0.33. Given that normal distribution is
¯
bellshaped, we know x = 2.5 draw from the distribution of X is quite probable if µ = 2.6, so we
¯
should not reject the H0 .
Key thing: Because of the equality in the null, what we consider unlikely (under the assumption
that the null hypothesis is true) can be on either tail. We will build our rejection regions on both
tails. Utku Suleymanoglu (UMich) Hypothesis Testing 23 / 39 What determines test results? This time lets start with the Critical Value Approach:
After stating the hypotheses and calcul...
View
Full Document
 Spring '08
 STAFF
 Normal Distribution, Null hypothesis, Hypothesis testing, Statistical hypothesis testing, Utku Suleymanoglu

Click to edit the document details