### Note_Set_7

Course: ADA 2006, Fall 2009
School: Rochester
Applied data analysis PSC 200 Fall 2006 Note Set 7 Inference: Means Applied data analysis Fall 2006 Note Set 7 Page 2 Outline of Lecture Overview of Hypothesis Testing Method 1: P-Values Method 2: Critical Values Method 3: Confidence Intervals...

Applied <a href="/keyword/data-analysis/" >data analysis</a> PSC 200 Fall 2006 Note Set 7 Inference: Means Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 2 Outline of Lecture Overview of Hypothesis Testing Method 1: P-Values Method 2: Critical Values Method 3: Confidence Intervals One-Sided Tests Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 3 Overview of Hypothesis Testing We form some hypothesis about the data Typically, our hope is to reject a null hypothesis We hope to establish beyond a reasonable doubt that the null hypothesis is false Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 4 Overview of Hypothesis Testing Example: A pharmaceutical company (PharCo) has developed a new medication (NewMed) that it believes is superior to the best existing medication (OldMed) In particular, PharCo believes that the population proportion, N , who will recover using NewMed is greater than the proportion, O , who will recover using OldMed The burden of proof is on PharCo to demonstrate that NewMed is better The null hypothesis is that NewMed is no better than OldMed Formally, the null is that the population proportions are equal, N O = 0 Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 5 Overview of Hypothesis Testing PHARCO conducts a clinical trial, randomly assigning ill patients to one of the two treatments It determines the proportion of patients who recover when treated by NewMed, pN , and the proportion who recover when treated by OldMed, pO PHARCO wishes to reject this null hypothesis to demonstrate that NewMed is better than OldMed Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 6 Overview of Hypothesis Testing Suppose that PHARCO finds that pN &gt; pO But is this sufficient to reject the null hypothesis? If observing this value of pN pO is sufficiently unlikely under the null hypothesis, then we reject the null hypothesis Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 7 Method 1: P-Values Let us consider testing the hypothesis that the average starting salary of U of R graduates is \$60,000 Suppose that we have obtained a sample of 15 recent graduates and determined that the sample average is \$62,500. Suppose further that the standard deviation is known to be \$9,200 We will assume that the starting salaries of U-of-R grades is normally distributed Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 8 Method 1: P-Values We first state the null hypothesis and the alternative hypothesis The null hypothesis is that the population mean starting salary is equal to \$60,000 The alternative hypothesis is that the population mean starting salary is not equal to \$60,000 We write the null and alternative hypothesis as, - H 0 : = 0 (null hypothesis) - H A : 0 (alternative hypothesis) In this case, H 0 : = 60,000 and H A : 60,000 Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 9 Method 1: P-Values The logic of hypothesis testing: If the true mean were \$60,000, how likely is an observation as extreme as \$62,500 By as extreme, we mean greater than \$62,500 or less than \$57,500 We want to calculate, P( X 57,500) + P ( X 62,500) X 0 Let us standardize the mean using Z = , where 0 n denotes the hypothesized value Under the assumption that starting salaries are \$60,000 on averages, the Z scores has the standard normal distribution Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 10 Method 1: P-Values Notice, P( X 57,500) + P ( X 62,500) = 1 P(57,500 X 62,500) 57,500 60,000 X 60,000 62,500 60,000 = 1 P 9, 200 15 9, 200 15 9, 200 15 = 1 P ( 1.052 Z 1.052 ) = 2* P ( Z 1.052 ) Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 11 Method 1: P-Values The probability can be computed using table A We find that the p-value is, pval 2*.1492 = 0.2984 Alternative approach: use QuickCalc spreadsheet from course webpage which gives p = 0.2926 Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 12 Method 1: P-Values Interpreting the p-value - Under the assumption that the null hypothesis is true, the probability of observing data as extreme as we do is p = 0.2926 - If this is sufficiently unlikely, we reject the null hypothesis - Otherwise, we accept the null hypothesis - Most common cutoff: reject if p &lt; 0.05 - We use to denote the rejection level - The decision rule is: Reject if p &lt; Accept if p Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 13 Method 1: P-Values Summary of Steps: Step 1: Determine X , s , N , and 0 X (sample mean) s (sample standard deviation) N (sample size) 0 (hypothesized value of population mean) H 0 : = 0 (null hypothesis) H A : 0 (alternative hypothesis) Step 2: State the null and alternative Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 14 Method 1: P-Values Step 3: Calculate the Z-statistic s N Step 4: Calculate the P-value X 0 p = 2* P Z s N Step 5: Rejection Rule Reject if p &lt; , accept otherwise Z= X 0 Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 15 Method 1: P-Values Assumptions: The sample size is large enough so that the Central Limit Theorem provides a good approximation The sampling distribution is exactly normal when the data are normally distributed and the standard deviation is known (rather than estimated) If the data are not normally distributed, or the standard deviation is estimated from the data, then the sampling distribution will converge to the normal in large samples Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 16 Method 1: P-Values The finite sample distribution is the t-distribution if the data normally distributed and the variance is estimated in the usual way SPSS implements a test based on the t-distribution (called a t-test) In this course, we will rely on the normal approximation Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 17 Method 1: P-Values Example: - You have recently been purchasing a bag of mixed nuts as a snack, but have become suspicious that you are being ripped off. The packaging claims an average 8 cashews per package, but the last 12 packages you have purchased contained 7,9,6,7,7,5,9,10,4,6,7, and 7 cashews. Do you have reason to believe you are being ripped off? Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 18 Method 1: P-Values An example with the raw data: Data set nasdaq.sav has weekly NASDAQ returns between 1996 and 2005 Nasdaq index has an opening price and a closing price for each week / the closing price this week will be next weeks opening price The percentage return is calculated using, Close - Open % Return = Open Do NASDAQ routines yield a positive return? Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 19 Method 1: P-Values In SPSS, Analyze Compare Means One sample ttest Select variables to test Input hypothesized value, in this case, 0 Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 20 Method 1: P-Values Descriptive statistics: Descriptive Statistics N Return Valid N (listwise) Minimum Maximum 521 -25.78 16.03 521 Mean Std. Deviation .1028 3.81292 Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 21 Method 1: P-Values Histogram: Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 22 Method 1: P-Values Result of hypothesis test (p-value highlighted) One-Sample Test Test Value = 0 95% Confidence Interval of the Difference Lower Upper -.2253 .4310 t Return .616 df 520 Sig. (2Mean tailed) Difference .538 .10283 We cannot reject that the mean return on the NASDAQ index is 0 at any conventional significance level Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 23 Method 1: P-Values Another example with raw data: Data file init.sav contains information on voter turnout and initiatives in the states Let us test whether voter turnout in 2000 and 2004 was equal to 50% Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 24 Method 2: Critical Values The p-value is the most intuitive approach to hypothesis testing However, there are other approaches: - Critical values - Confidence intervals Why use the alternative approaches: - Critical values: one can get an answer without needing to use a computer / good for quick calculations - Confidence intervals: Gives you a better sense of the power of a test and allows you to test many null hypotheses at once Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 25 Method 2: Critical Values For any test of size , there exists some cutoff Z crit . such that we will reject the null hypothesis if an only if | Z |&gt; Z crit . How do we compute this value? Recall that our decision rule was, reject if p &lt; . In this case, p = 2 P ( Z | Z stat . |) = 2 ( | Z stat . |) where signifies the normal cumulative distribution function Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 26 Method 2: Critical Values This implies that we reject if 2 ( | Z stat . |) &lt; Since is an invertible function, we can write | Z stat . |&gt; 1 ( 2 ) , which implies that we can reject if Z stat . &lt; 1 ( 2) or if Z stat . &gt; 1 ( 2) . Defining Z 2 = 1 ( 2) = 1 1 ( 2) , we have the following decision rule: - Reject the null hypothesis if Z &lt; Z 2 or Z &gt; Z 2 Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 27 Method 2: Critical Values Why this approach is useful: - You can easily memorize a number of important critical values: Z 0.1 2 = 1.645 (10% level) Z 0.05 2 = 1.960 (5% level) Z 0.01 2 = 2.236 (1% level) - The critical values approach is less informative than the p-value approach, but it is quicker Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 28 Method 2: Critical Values A car company claims that its new hybrid vehicle gets 60 miles to the gallon. You don t believe this claim, so you tests a random sample of 18 cars and find a sample average of X = 57 with a sample standard deviation of s = 7 . Is their sufficient evidence to reject the car company s claim? Use an = 10% significance level Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 29 Method 2: Critical Values First, we state the null and alternative hypotheses formally: H 0 : = 60 , H A : 60 We can compute the Z statistic using, X 0 57 60 = = 1.81 Z= s / N 7 / 18 Recall that Z crit . = Z 0.1/ 2 = 1.6449 . Since | Z |&gt; Z crit . we reject the null hypothesis The car company is incorrect- the car gets less than 60 miles per gallon Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 30 Method 2: Critical Values Summary of Steps: Step 1: Determine X , s , N , and 0 X (sample mean) s (sample standard deviation) N (sample size) 0 (hypothesized value of population mean) H 0 : = 0 (null hypothesis) H A : 0 (alternative hypothesis) Step 2: State the null and alternative Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 31 Method 2: Critical Values Step 3: Calculate the Z-statistic s N Step 4: Reject if Z &lt; Z 2 or Z &gt; Z 2 , accept otherwise Z= X 0 Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 32 Method 3: Confidence Intervals The third approach to hypothesis testing: form a confidence interval A (1 )% confidence interval contains the set of values 0 such that we cannot reject the null hypothesis = 0 at the % level, using a two-sided test Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 33 Method 3: Confidence Intervals Recall that our rejection rule was, reject if Z &lt; Z 2 or X 0 Z &gt; Z 2 where Z = s n This implies that we fail to reject the hypothesis satisfying, Z 2 Z Z 2 , or, X Z 2 * s n 0 X + Z 2 * s n Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 34 Method 3: Confidence Intervals The confidence integral is given by X B where B = Z 2 * s n Alternatively, the confidence interval is [ X Z 2 * s n , X + Z 2 * s n ] Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 35 Method 3: Confidence Intervals In some ways, confidence intervals are less informative than p-values - The p-value of p &lt; 0.0001 would us that we can strongly reject the null hypothesis, or reject it at any conventional level of significance - The 95% confidence interval would not In some ways, confidence intervals are more informative then p-values - The confidence interval of [ 0.14,1.3] would tells us that we can reject the null hypothesis that = 0.15 - A p-value for the null hypothesis that = 0 would not Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 36 Method 3: Confidence Intervals Use critical values to get a quick answer Compute both the p-value and the confidence interval since both contain important information Interpret confidence intervals as the set of null hypotheses that we cannot reject at the % level. Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 37 Method 3: Confidence Intervals Summary of Steps: Step 1: Determine X , s , and N X (sample mean) s (sample standard deviation) N (sample size) Step 2: Calculate the confidence interval, [ X Z 2 * s n , X + Z 2 * s n ] Step 3: Reject = 0 if 0 is not contained in the interval, accept otherwise Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 38 Method 3: Confidence Intervals Example: - A real estate agent has calculated that the average price of a home in Rochester is X = 115,100 with a standard deviation of s = 10, 400 , using a sample of 104 homes. Form a 99% confidence interval for the population mean . Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 39 Method 3: Confidence Intervals What is the confidence interval for the average weekly NASDAQ return, 1999-2005? One-Sample Test Test Value = 0 95% Confidence Interval of the Difference Lower Upper -.2253 .4310 t Return .616 df 520 Sig. (2Mean tailed) Difference .538 .10283 Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 40 Method 3: Confidence Intervals Example: - Suppose we collected voter turnout rates from a random sample of 43 countries. We determine that the sample mean is X = 0.75 and that the standard deviation is s = 0.13. What is the 99% confidence interval for the population mean voter turnout? Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 41 Method 3: Confidence Intervals We remember that Z 5% 2 = 1.96 The confidence interval for the population mean is, [ X Z 2 s N , X + Z 2 s N] = [0.75 1.96*0.13 43 ,0.75 + 1.96*0.13 43] = [71%,79%] We can calculate this in the QuickCalc spreadsheet as well Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 42 Method 3: Confidence Intervals Some general facts about hypothesis tests: - If we can reject at the 1% level, then we can reject at the 5% level - If we can reject at the 5% level, then we can reject at the 10% level - A 99% confidence interval is larger than a 95% confidence interval - A 95% confidence interval is larger than a 90% confidence interval Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 43 One-Sided Tests We have thus far considered hypotheses in which a single value is specified for the null hypothesis, and the null will be rejected if the calculated value is sufficiently far from the null in either the positive or negative direction. This is the most commonly used type of hypothesis test. It is called a two-tailed test. Sometimes, a one-sided test is employed. For example, recall the mixed nuts example we considered previously Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 44 One-Sided Tests Suppose that we are interested in testing the null hypothesis H 0 : = 8 against the alternative H A : &lt; 8 Recall that our Z-value was Z = 2.03. A one-sided hypothesis test answers the question, if the null hypothesis were true, what is that probability of observing a value as extreme as X , that is also negative (or positive when the alternative is H A : &gt; 8). Applied <a href="/keyword/data-analysis/" >data analysis</a> Fall 2006 Note Set 7 Page 45 One-Sided Tests The critical Z-value is Z (rather than Z 2 ) We reject if Z &lt; Z (or Z &gt; Z when the alternative is H A : &lt; 8 ). We will focus almost exclusively on two-tailed tests in this course, as in practice, but the preceding discussion illustrates that one-tailed tests are a simple variant of two-tailed tests Note: A two-tailed test is more conservativ...

