slides9-PRE - Hypothesis Testing Utku Suleymanoglu UMich...

Info icon This preview shows pages 0–1. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Hypothesis Testing Utku Suleymanoglu UMich Utku Suleymanoglu (UMich) Hypothesis Testing 1 / 37 Introduction The Idea Statistical inference is about arriving at conclusions about unknown population parameters using data. Example: we want to know the yearly income of public university graduates in the USA at the age 30. An unknown mean µ, unknown variance, σ 2 , unknown distribution. Can’t interview all young people, so we get a sample of 100: x = 55. Let’s say we know ¯ σ2 = 9 x is our best guess for µ. We can also build a confidence interval around it: something ¯ like x ± zα/2 σx . ¯ ¯ These are our best answers to the question “what is µ?”. Our estimate, x and σ 2 , and what we know of the sampling distribution of x , can also ¯ ¯ help us evaluate claims about µ. Suppose somebody claims that µ ≥ 60. Utku Suleymanoglu (UMich) Hypothesis Testing 2 / 37 Introduction We are going to learn how to evaluate the validity of these types of hypotheses regarding the unknown population parameters (µ, or p .) ¯ We just measured that with 100 observations that x = 55. The spirit of the testing would ¯ be evaluating the chances of this sample being drawn from a population with µ ≥ 60. Is it possible to have an x = 55 from a distribution with µ = 60? Yes, techically, but ¯ highly unlikely. We use the sampling the distribution of the sample mean under the assumption that the claim is right: Then X ∼ N (60, 9/100) = N (60, 0.09). We know x will change from sample to sample, but how likely it is that it is less than 55, ¯ or 54? Very very very . . . very unlikely. Then, this claim is not really credible, the data does not support it. This is the spirit of hypothesis testing. Let’s formalize the idea of “claims.” Utku Suleymanoglu (UMich) Hypothesis Testing 3 / 37 Null and Alternative Hypotheses The Null and Its Alternative We will devise two complementary hypotheses to formalize the idea of a claim: null hypothesis and alternative hypothesis. Null Hypothesis: Holds the claim to be challenged, to be refuted if possible. H0 : µ ≥ 60. Alternative Hypothesis: Alternative theory to be maintained if the null hypothesis is rejected: H1 (HA ) : µ < 60. As a researcher if I think that the average yearly income at 30 for university graduates (µ) is less than 60 thousand dollars a year, to back up my claim, I challange the opposite claim, that it is higher than 60. If I reject the H0 , then I get evidence for my theory. Alternatively, you might also be asked a hypothesis directly. “Test the claim that µ is less 50”. Then this claim should be put in the null hypothesis. Utku Suleymanoglu (UMich) Hypothesis Testing 4 / 37 Null and Alternative Hypotheses Examples 1. You are unhappy with your car: you think your new car travels less than 30 mpg. To get it serviced for free, you want to find evidence that MPG < 30. You record your consumption for a month and build a sample. You challange the claim that E (MPG ) = µ ≥ 30 H0 :µ ≥ 30 H1 :µ < 30 2. You are an engineer and have a machine in your factory that produces pencils. You think the machine is broken, and it produces pencils of incorrect lenghts. It supposed to produce pencils of an average length of 5 inches. You collect a sample and test: H0 :µ = 5 H1 :µ = 5 KEY: In all hypothesis testing, you look at the evidence and decide whether you have enough evidence to reject the null hypothesis. If you don’t have it, you fail to reject the null hypothesis: you don’t accept the null hypothesis. Jury analogy. . . Utku Suleymanoglu (UMich) Hypothesis Testing 5 / 37 Null and Alternative Hypotheses Type I and II Errors The hypothesis testing we will do is not perfect: there could be mistakes. Reality Testing Result H0 is True H0 is False Reject H0 Type I Error Correct Fail to Reject H0 Correct Type II Error TYPE I Error: Null Hypothesis is true and you reject it. Probability = α. Significance level. We can choose this. TYPE II Error: Null Hypothesis is false and you fail to reject it. Probability = β . Power= 1 − β . This depends on the unknown population parameter. Have limited control over this. Utku Suleymanoglu (UMich) Hypothesis Testing 6 / 37 Null and Alternative Hypotheses Trial Analogy Reality Verdict INNOCENT (H0 ) GUILTY Reject (Verdict=Guilty) H0 Type I Error Correct Fail to Reject (Verdict= Not Guilty) H0 Correct Type II Error We set a high standard for convicting people. We assume innocence, then try to find evidence to reject this presumption. We do the same for null hypothesis as well: unless there is a lot of evidence, we do not reject it. Type I error: Innocent man gets the chair, Type II error: Murderer walks away. Society and statisticians try to minimize the probability of Type I error first, and demand a lot evidence to reject an H0 . Key thing: If we fail to reject H0 , we don’t say “we proved H0 ”, we just don’t have enough evidence against it. Analogy: if the defendant walks away, his innocence is not proved, instead his guilt has not been proved with enough evidence. Utku Suleymanoglu (UMich) Hypothesis Testing 7 / 37 Null and Alternative Hypotheses General Testing Procedure TEST PROCEDURE: 1 Formulate and state null and alternative hypothesis. 2 (Select a significance level: α) 3 Calculate a suitable test statistic using available sample statistics to use in conjuction with. . . 4 (Develop and) Use a decision rule to make a call about H0 . (a) Assume the null hypothesis is valid. (b) Figure out the sampling distribution of the sample statistic under the assumption is null hypothesis correct. (c) Figure the distribution of the test statistic under the null. (d) Select a criteria that uses probability distribution of the test statistic to reject or fail to reject the null hypothesis. The criteria uses α as a tolerance level. 5 State your conclusion on the null hypothesis. Utku Suleymanoglu (UMich) Hypothesis Testing 8 / 37 Testing Hypothesis about the Population Mean: σ known σ known: One tailed tests We start with unrealistic case where σ is known. This works exactly if population values have a normal distribution, and approximately if not. One-Tailed Tests A left-tailed test has the H0 and H1 : H 0 : µ ≥ µ0 H1 :µ < µ0 A right-tailed test has the H0 and H1 : H 0 : µ ≤ µ0 H1 :µ > µ0 Utku Suleymanoglu (UMich) Hypothesis Testing 9 / 37 Testing Hypothesis about the Population Mean: σ known Test statistic for tests with known σ ’s will have the test statistic: z= x − µ0 ¯ √ σ/ n Now, we need to come up with a testing criteria. There are two equivalent ways of doing this: p-value approach Critical value (rejection region) approach These are best explained with an example. We will discuss the logic of hypothesis testing with this example. Important Note: We will discuss hypothesis testing regarding µ and p in different scenarios. The first scenario is for µ where σ is known. I will spend an extra amount of time on this case to explain to you the logic of hypothesis testing. This logic carries through everything we are going to do, so I will not repeat at again. Don’t mistake me spending a lot of time on the first case for other cases not being important. Utku Suleymanoglu (UMich) Hypothesis Testing 10 / 37 Testing Hypothesis about the Population Mean: σ known Long Running Example Suppose you think the average lifespan of energy-saving light bulbs is less than 3 years. You collect a sample of 25 newly builty bulbs and measure their lifespan. You get x = 2.5. You (somehow) know standard deviation of their lifespan is σ = 1.5. Then we ¯ have the hypotheses: H 0 :µ ≥ 3 H 1 :µ < 3 This is a left-tailed test. Relevant test statistic for it is: z= Utku Suleymanoglu (UMich) x − µ0 ¯ 2 .5 − 3 √= = −1.66 1.5/5 σ/ n Hypothesis Testing 11 / 37 Testing Hypothesis about the Population Mean: σ known Decision rule will evaluate how likely it is to get a sample with x = 2.5 if you ¯ population mean of 3 years. .5 2 ¯ We know X ∼ N (3, (125) ) if µ = 3 were true. ¯ Then the question is: If so what is the probability of getting an X ≤ 2.5? Well we can calculate that! ¯ P ( X < 2 .5 ) = P ( Z < 2.5 − 3 ) = P (Z < z ) = P (Z < −1.66) 0 .3 z test statistic We can calculate this probability using the z-table. It is 0.0485: the probability that you get a sample that produces an x which is equal to or lower than than our ¯ current estimate x = 2.5 if the null hypothesis were true. ¯ This is a small probability, so we should probably think that H0 is false. But what is a small probability? We need a criteria for small probability. We will call this criteria significance level and denote it with α. α is in researcher’s control and usually set it to be 0.1, 0.01 or 0.05. If you calculate a probability for your sample which is less than α, you reject the null hypotesis. This is the essence of p-value approach Utku Suleymanoglu (UMich) Hypothesis Testing 12 / 37 Testing Hypothesis about the Population Mean: σ known Graphical recap: 0 Utku Suleymanoglu (UMich) Hypothesis Testing z 13 / 37 Testing Hypothesis about the Population Mean: σ known p-value Approach for One-Tailed Tests We are working on the case: tests for µ where σ is known, but this logic generalizes to many cases. 1. After hypotheses statement and the calculation of test statistic (z for this case) : 2. Calculate Left-tailed tests: Calculate (left-tail) probability that the sample mean is less than x ¯ at hand if the null is true via: P (Z ≤ z ). Right-tailed tests: Calculate (right-tail) probability that the sample mean is more than x at hand if the null is true via: P (Z ≥ z ). ¯ 3. The probability you calculate is called the p-value. 4. Compare the p-value with α. If p-value < α : Reject the H0 . You have enough evidence that H0 is false. If p-value > α : Fail to reject H0 . There is not enough evidence to reject the null hypothesis. 5. In our example, we reject the null if α is set to be 0.05 or 0.1 but not if 0.01. p-value approach allows easy comparison of decision with different α’s. 6. Notice: p-value is the smallest α choice where H0 is rejected. Utku Suleymanoglu (UMich) Hypothesis Testing 14 / 37 Testing Hypothesis about the Population Mean: σ known Critical Value Approach for One-Tailed Tests Another equivalently valid approach to create to criteria for testing would be this: 1. After hypotheses statement and the calculation of test statistic (z for this case). 2. Set an α. Say, α = 0.05. 3. Figure out the critical value zα such that P (Z ≥ zα ) = α. The z-value with right-tail probability of α. 4. Make a decision about H0 by comparing the test statistic with with the critical value. Critical value tells us which values are too far off from the null hypothesis value. Left-tailed tests: Reject H0 if z < −zα . Right-tailed tests: Reject H0 if z > zα . 5. Choosing an α and finding the critical value creates a rejection region. If the test statistic is in this region, H0 is rejected. Utku Suleymanoglu (UMich) Hypothesis Testing 15 / 37 Testing Hypothesis about the Population Mean: σ known Example Cont. For our example, critical value with α = 0.05 is −1.645 = −zα . Then any test static which is smaller than than −1.645 is in the rejection region. We had z = −1.66, so we reject the the null hypothesis. 0 Utku Suleymanoglu (UMich) Hypothesis Testing z 16 / 37 Testing Hypothesis about the Population Mean: σ known Right-Tailed Test Example Suppose now that you also believe that µ > 1.5. So you want to test the claim µ ≤ 1.5. Let’s do that. First, let’s properly state the hypotheses: H0 :µ ≤ 1.5 H1 :µ > 1.5 This is a right-tailed test. Under the assumption that null hypothesis is true, we need to evaluate the chances of x ≥ 2.5. If small, the null hypothesis is not likely to be true. ¯ Next step: relevant test statistic is z = x −µ0 ¯ √ σ/ n = 2.5−1.5 0.3 = 3.33. Next: Critical value for α = 0.01 is z0.01 = 2.33. Next: Use the decision rule: z > 2.33 so we reject H0 . OR: p-value for z = 3.33 is smaller than 0.001, so we reject the null hypothesis for reasonable α. Utku Suleymanoglu (UMich) Hypothesis Testing 17 / 37 Testing Hypothesis about the Population Mean: σ known Determinants of Tests Results Before we go on and discuss testing with different situations, let’s summarize determinants of the test results. The difference between hypothesized parameter and calculated statistic: x − µ0 . ¯ Generally speaking, if the claim is too far off from the hypothesized value, test statistic would be larger in absolute terms. The effect of this on test depend on the sign of the test statistic and the tail of the test. Precision of the sampling distribution: Standard error of the sample statistic (x ) ¯ determines how much variation we should expect in x from sample to sample. If it is ¯ high, it is more likely to have samples with x that deviates largely from the ¯ hypothesized value. Utku Suleymanoglu (UMich) Hypothesis Testing 18 / 37 Testing Hypothesis about the Population Mean: σ known Universally: Level of significance: α is our choice as researchers. Think about the p-value approach. You compare your calculated p-value with different α’s. If p = 0.04, you reject the null with α = 0.05, but not if α = 0.01. To reject a null with α = 0.01 or α = 0.001, you need a really small p-value. So as α decreases, you ask for more and more evidence against the null hypothesis to be able to reject it. A small α choice means you have a small probability of rejecting a true hypothesis (Type I error, executing the innocent). But a small α is also nit-picking about the evidence and not rejecting H0 most of the time. So maybe you are also not rejecting some false hypotheses: probability of committing Type II error increases. Generally speaking an α = 0.05 is norm. If one needs to be more conservative about rejecting the null, α = 0.01 is the choice. An α = 0.1 is also ok. There is no clear-cut reason to choose one over the others. But we don’t use α = 0.2 or α = 0.8. The nice thing about providing p-values is that you allow the readers to pick their own α’s and arrive at their own conclusions quickly. Utku Suleymanoglu (UMich) Hypothesis Testing 19 / 37 Testing Hypothesis about the Population Mean: σ known Example Remember we have x = 2.5 and σ = 1.5 and n = 25. Suppose now that you have to test ¯ H0 : µ ≥ 1 with the alternative hypothesis H1 : µ < 1. 5 This is a left-tailed test. We can calculate the test statistic: z = 2.0.−1 = 5. That is a 3 pretty big z . (If this was a right-tailed test you would reject the null hypothesis.) But this is a left-tailed test. The p-value is calculated as the left tail probability. P (Z ≤ 5) ≈ 1. That is bigger than any α imaginable. So you fail to reject the null. (Critical value is negative for left-tailed tests, so a positive test statistic cannot be in the rejection region) Even if the test results seems obvious, we still test it properly. And we still don’t call it “accept”: P (Z ≤ 5) = 1, it is P (Z ≤ 5) ≈ 1 And this is an example why a higher test statistic does not necessarily mean right away that you are more likely to reject the null. Generally, a big test statistic in the direction of the tail of the test is likely to reject the null hypothesis. Utku Suleymanoglu (UMich) Hypothesis Testing 20 / 37 Testing Hypothesis about the Population Mean: σ known σ is known: Two Tailed Tests Now we will discuss a slightly different type of test. The difference is in the null and alternative hypothesis tests: H 0 : µ = µ0 H 1 : µ = µ0 These type of tests judge the claim that unknown population parameter is exactly equal to some number. In economics, two-tailed tests are performed a lot to test things like: Whether a production technology have constant returns to scale. (population parameter=1) Whether a job training program has any effect on wages whatsoever. (population parameter=0) We will come back to the latter one again when we do regression analysis. Utku Suleymanoglu (UMich) Hypothesis Testing 21 / 37 Testing Hypothesis about the Population Mean: σ known Example Test procedure is very similar to the one-tailed tests with a few but important differences. Suppose with your lightbulb sample (remember x = 2.5, n = 25 and σ = 1.5.) Now ¯ suppose that there is a claim that says the mean life expectancy of lightbulbs is 2.6 years: H0 :µ = 2.6 H1 :µ = 2.6 The test statistic is going to be identical with one-tailed tests: z= x − µ0 ¯ 2.5 − 2.6 √= = −0.33 0.3 σ/ n The test statistic calculates the relative position of 2.5 with respect to hypothesized value for µ: 2.6. You can see it is fairly close as measured by z . Given that normal ¯ distribution is bell-shaped, we know x = 2.5 draw from the distribution of X is quite ¯ probable if µ = 2.6, so we should not reject the H0 . Key thing: Because of the equality in the null, what we consider unlikely if the null hypothesis is true can be on either tail. We will build our rejection regions on both tails. Utku Suleymanoglu (UMich) Hypothesis Testing 22 / 37 Testing Hypothesis about the Population Mean: σ known This time let’s start with the Critical Value Approach: After stating the hypotheses and calculating the z-statistic, the decision rule is going to be: x −µ ¯ Reject H0 if | σ/√0 | > zα/2 n In other words, reject the null if the test statistic is outside the interval (−zα/2 , zα/2 ) where critical value zα/2 is the z-value for upper tail probability α/2. For our example, we have z = α/2 = 0.025 and zα/2 = 1.96. 2.5−2.6 √ 1/ 25 = 0.1/0.3 = −0.33. Let’s pick α = 0.05, then Because z = 0.33 lies inside the interval (−1.96, 1.96), we do not reject the null hypothesis. We don’t have enough evidence to assert that µ = 2.6 is not the case. Let’s see what we are doing on a picture. Utku Suleymanoglu (UMich) Hypothesis Testing 23 / 37 Testing Hypothesis about the Population Mean: σ known Graphical Explanation 0 Utku Suleymanoglu (UMich) Hypothesis Testing z 24 / 37 Testing Hypothesis about the Population Mean: σ known p-value Approach for Two-Tailed Tests We can also calculate a p-value for the test statistic z , to compare with different α’s to get a conclusion. The p-value can be calculated as the area outside the interval (−z , z ), or simply (due to symmetricity) as 2 × P (Z > z = 0.33) In our example we get a p-value 2 × P (Z > z = 0.33) = 2 × 0.3707 = 0.7414. This value is bigger than any reasonable α so reach at the same conclusion as before. We fail to reject the null hypothesis. Let’s go back one slide and see this on a picture. Utku Suleymanoglu (UMich) Hypothesis Testing 25 / 37 Testing Hypothesis about the Population Mean: σ known Hypothesis Testing Fundamentals Recap Before we go on to different cases, let’s repeat the general idea of hypothesis testing: We have an hypothetical value for a population parameter (µ = µ0 ) as a claim and we want to test this. We have a sample and a point estimate x = 2, let’s say. ¯ We know the sampling distribution of x assuming the claim is true from chapter 6. ¯ Then we can evaluate the probability of x or a similar draw from this sampling ¯ distribution. To do that we need to transform our normal random variable to standard normal, this gives us z-statistic. Then we can either calculate the probability associated with the z-statistic and see if it is small or big (p-value approach) compare it to some critical z-value so that we can assess how far off it is from the claimed value. (critical value approach) Either way, based on the assumption the claim is true, we assess the correctness of the claim by comparing it to what we observe in the data. If two are “different enough”, we say the claim is (probably) not correct. Utku Suleymanoglu (UMich) Hypothesis Testing 26 / 37 σ not Known Case 2: σ not known, population normal When σ is not known, we can use s , sample standard deviation, instead. Just like we did before. . . for CI’s. But remember, we need a modification to make this work. We need to use t-distribution instead of standard normal distribution. p-value approach is hard to perform with t-distribution, so we will ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern