2500M_Seury_L24_6pg qr

2500M_Seury_L24_6pg qr - Table D [7] Inference for...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Table D [7] Inference for Distributions For df = 9 we only look into the corresponding row. Lecture Notes The calculated value of t is 2.7. We find the 2 closest t values. Simin Seury, Department of Economics, York University, Canada Date: April 1 2010 2.398 < t = 2.7 < 2.821 thus 0.02 > upper tail p > 0.01 For a one-sided Ha, this is the P-value (between 0.01 and 0.02); for a two-sided Ha, the P-value is doubled (between 0.02 and 0.04). The one-sample t-test Excel TDIST(x, degrees_freedom, tails) As in the previous chapter, a test of hypotheses requires a few steps: TDIST = P(X > x) for a random variable X following the t distribution (x positive). Use it in place of Table C or to obtain the p-value for a positive t-value. X is the standardized value at which to evaluate the distribution (i.e., “t”). Degrees_freedom is an integer indicating the number of degrees of freedom. Tails specifies the number of distribution tails to return. If tails = 1, TDIST returns the one-tailed p-value. If tails = 2, TDIST returns the two-tailed p-value. 1. Stating the null and alternative hypotheses (H0 versus Ha) 2. Deciding on a one-sided or two-sided test TINV(probability,degrees_freedom) 3. Choosing a significance level α Gives the t-value (e.g., t*) for a given probability and degrees of freedom. Probability is the probability associated with the two-tailed t distribution. 4. Calculating t and its degrees of freedom Degrees_freedom is the number of degrees of freedom of the t distribution. 5. Finding the area under the curve with Table D 6. Stating the P-value and interpreting the result Sweetening colas The P-value is the probability, if H0 is true, of randomly drawing a sample like the one obtained or more extreme, in the direction of Ha. The P-value is calculated as the corresponding area under the curve, one-tailed or two-tailed depending on Ha: One-sided (one-tailed) t= Two-sided (two-tailed) x − µ0 s n Cola manufacturers want to test how much the sweetness of a new cola drink is affected by storage. The sweetness loss due to storage was evaluated by 10 professional tasters (by comparing the sweetness before and after storage): Taster 1 2 3 4 5 6 7 8 9 10 Sweetness loss 2.0 0.4 0.7 2.0 −0.4 2.2 −1.3 1.2 1.1 2.3 Obviously, we want to test if storage results in a loss of sweetness, thus: H0: µ = 0 versus Ha: µ > 0 Notice, here we do not know the population parameter σ . The population of all cola drinkers is too large. Since this is a new cola recipe, we have no population data. This situation is very common with real data. 1 Sweetening colas (continued) Is there evidence that storage results in sweetness loss for the new cola recipe at the 0.05 level of significance (α = 5%)? H0: µ = 0 versus Ha: µ > 0 (one-sided test) t= x − µ0 1.02 − 0 = = 2.70 s n 1.196 10 The critical value tα = 1.833. t > tα thus the result is significant. 2.398 < t = 2.70 < 2.821 thus 0.02 > p > 0.01. p < α thus the result is significant. Taster Sweetness loss 1 2.0 2 0.4 3 0.7 4 2.0 5 -0.4 6 2.2 7 -1.3 8 1.2 9 1.1 10 2.3 ___________________________ Average 1.02 Standard deviation 1.196 Degrees of freedom n− 1= 9 In these cases, we use the paired data to test the difference in the two population means. The variable studied becomes Xdifference = (X1 − X2), and H0: µdifference= 0 ; Ha: µdifference>0 (or <0, or ≠0) Conceptually, this is not different from tests on one population. The t-test has a significant p-value. We reject H0. There is a significant loss of sweetness, on average, following storage. Sweetening colas (continued) Sweetening colas (revisited) The sweetness loss due to storage was evaluated by 10 professional tasters (comparing the sweetness before and after storage): x −µ 1.02 − 0 t= = = 2.70 s n 1.196 10 df = n − 1 = 9 Taster 1 2 3 4 5 6 7 8 9 10 Sweetness loss 2.0 0.4 0.7 2.0 −0.4 2.2 −1.3 1.2 1.1 2.3 We want to test if storage results in a loss of sweetness, thus: H0: µ = 0 versus Ha: µ > 0 In Excel, you can obtain the precise p-value once you have calculated t: Use the function dist(t, df, tails) “=tdist(2.7, 9, 1),” which gives 0.01226 Matched pairs t procedures Sometimes we want to compare treatments or conditions at the individual level. These situations produce two samples that are not independent — they are related to each other. The members of one sample are identical to, or matched (paired) with, the members of the Although the text didn’t mention it explicitly, this is a pre-/post-test design and the variable is the difference in cola sweetness before minus after storage. A matched pairs test of significance is indeed just like a one-sample test. Does lack of caffeine increase depression? Individuals diagnosed as caffeine-dependent are deprived of caffeine-rich foods and assigned to receive daily pills. Sometimes, the pills contain caffeine and other times they contain a placebo. Depression was assessed. Depression Depression Placebo Subject with Caffeine with Placebo Cafeine 1 5 16 11 2 5 23 18 3 4 5 1 4 3 7 4 5 8 14 6 6 5 24 19 7 0 6 6 8 0 3 3 9 2 15 13 10 11 12 1 11 1 0 -1 other sample. There are 2 data points for each subject, but we’ll only look at the difference. Example: Pre-test and post-test studies look at data collected on the The sample distribution appears appropriate for a t-test. same sample elements before and after some experiment is performed. factors by comparing a variable between sets of twins. Example: Using people matched for age, sex, and education in social studies allows canceling out the effect of these potential lurking variables. 20 11 “difference” data points. DIFFERENCE Example: Twin studies often try to sort out the influence of genetic 15 10 5 0 -5 -2 -1 0 1 Normal quantiles 2 2 Does lack of caffeine increase depression? Power of t-test For each individual in the sample, we have calculated a difference in depression OMITTED from syllabus score (placebo minus caffeine). There were 11 “difference” points, thus df = n − 1 = 10. We calculate that x = 7.36; s = 6.92 Depression Depression Placebo Subject with Caffeine with Placebo Cafeine 1 5 16 11 2 5 23 18 3 4 5 1 4 3 7 4 5 8 14 6 6 5 24 19 7 0 6 6 8 0 3 3 9 2 15 13 10 11 12 1 11 1 0 -1 H0: µ difference = 0 ; H0: µ difference > 0 t= x −0 7.36 = = 3.53 s n 6.92/ 11 For df = 10, 3.169 < t = 3.53 < 3.581 0.005 > p > 0.0025 Caffeine deprivation causes a significant increase in depression. statistical output for the caffeine study: a) Conducting a paired sample t-test on the raw data (caffeine and placebo) Example: Apartment Rents b) Conducting a one-sample t-test on difference (caffeine – placebo) Paired Samples Test Paired Differences Placebo - Caffeine Mean 7.364 Std. Deviation 6.918 Std. Error Mean 2.086 95% Confidence Interval of the Difference Lower Upper 2.716 12.011 t 3.530 df 10 Sig. (2-tailed) .005 One-Sample Test Test Value = 0 Difference t 3.530 df 10 Mean Sig. (2-tailed) Difference .005 7.364 Interval Estimation of a Population Mean: σ Unknown 95% Confidence Interval of the Difference Lower Upper 2.72 12.01 A reporter for a student newspaper is writing an article on the cost of off-campus housing. A sample of 16 efficiency apartments within a half-mile of campus resulted in a sample mean of $650 per month and a sample standard deviation of $55. Let us provide a 95% confidence interval estimate of the mean rent per month for the population of efficiency apartments within a half-mile of campus. We will assume this population to be normally distributed. Our alternative hypothesis was one-sided, thus our p-value is half of the two-tailed p-value provided in the software output (half of 0.005 = 0.0025). Robustness The t procedures are exactly correct when the population is distributed exactly normally. However, most real data are not exactly normal. The t procedures are robust to small deviations from normality – the results will not be affected too much. Factors that strongly matter: Random sampling. The sample must be an SRS from the population. Interval Estimation of a Population Mean: σ Unknown At 95% confidence, α = .05, and α/2 = .025. t.025 is based on n − 1 = 16 − 1 = 15 degrees of freedom. In the t distribution table we see that t.025 = 2.131. Degrees Area in Upper Tail Specifically: When n < 15, the data must be close to normal and without outliers. When 15 > n > 40, mild skewness is acceptable but not outliers. When n > 40, the t-statistic will be valid even with strong skewness. .20 .100 .050 .025 .010 .005 15 .866 1.341 1.753 2.131 2.602 2.947 16 .865 1.337 1.746 2.120 2.583 2.921 17 Outliers and skewness. They strongly influence the mean and therefore the t procedures. However, their impact diminishes as the sample size gets larger because of the Central Limit Theorem. of Freedom .863 1.333 1.740 2.110 2.567 2.898 18 .862 1.330 1.734 2.101 2.520 2.878 19 .861 . 1.328 . 1.729 . 2.093 . 2.539 . 2.861 . . 3 p-Value Approach to Two-Tailed Hypothesis Testing Interval Estimation of a Population Mean: σ Unknown Interval Estimate x ± t.025 650 ± 2.131 Compute the p-value using the following three steps: 1. Compute the value of the test statistic z. s n Margin of Error 2. If z is in the upper tail (z > 0), find the area under the standard normal curve to the right of z. If z is in the lower tail (z < 0), find the area under the standard normal curve to the left of z. 3. Double the tail area obtained in step 2 to obtain the p –value. The rejection rule: Reject H0 if the p-value < α . 55 = 650 ± 29.30 16 We are 95% confident that the mean rent per month for the population of efficiency apartments within a half-mile of campus is between $620.70 and $679.30. Summary of Interval Estimation Procedures for a Population Mean When the z score falls within the rejection region (shaded area on Can the population standard deviation σ be assumed known ? Yes No the tail-side), the p-value is smaller than α and you have shown statistical significance. Use the sample standard deviation s to estimate s σ Known z = -1.645 One-sided test, α = 5% Case Use x ± zα /2 σ σ Unknown Case n Use x ± tα /2 s n Two-sided test, α = 1% Z One-Tailed Tests About a Population Mean: σ Known Rejection region for a two-tail test of with α = 0.05 (5%) Critical Value Approach A two-sided test means that α is spread between both tails of the curve, thus: 4. Determine the critical value and rejection rule. -A middle area C of 1 − α = 95%, and For α = .05, z.05 = 1.645 -An upper tail area of α /2 = 0.025. Reject H0 if z > 1.645 0.025 0.025 5. Determine whether to reject H0. Because 2.47 > 1.645, we reject H0. There is sufficient statistical evidence to infer that Health Servicec is not meeting the response goal of 12 minutes. Table C upper tail probability p0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005 (…) z* 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291 Confidence interval C 50% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9% 4 Recap: Confidence intervals Remember, because a two-sided test is symmetrical, wee can also use a confidence interval to test a two-sided hypothesis. [6] Introduction to Inference In a two-sided test, C = 1 – α. C confidence level α /2 α /2 α significance level Use and Abuse of Tests Packs of cherry tomatoes (σ = 5 g): H0 : µ = 227 g versus Ha : µ ≠ 227 g Sample average 222 g. 95% CI for µ = 222 ± 1.96*5/√4 = 222 g ± 4.9 g 227 g does not belong to the 95% CI (217.1 to 226.9 g). Thus, we reject H0. confidence interval test and p-value Cautions about significance tests Ex: Your sample gives a 99% confidence interval of x ± m = 0.84 ± 0.0101. Choosing the significance level α With 99% confidence, could samples be from populations with µ = 0.86? µ = 0.85? Cannot reject H0: µ = 0.85 Factors often considered: What are the consequences of rejecting the null hypothesis (e.g., global warming, convicting a person for life with DNA evidence)? Reject H0 : µ = 0.86 99% C.I. Are you conducting a preliminary study? If so, you may want a larger α so x that you will be less likely to miss an interesting result. Some conventions: A confidence interval gives a black and white answer: Reject or don't reject H0. We typically use the standards of our field of work. But it also estimates a range of likely values for the true population mean µ. There are no “sharp” cutoffs: e.g., 4.9% versus 5.1 %. A P-value quantifies how strong the evidence is against the H0. But if you reject H0, it doesn’t provide any information about the true population mean µ. Does the packaging machine need revision? It is the order of magnitude of the P-value that matters: “somewhat significant,” “significant,” or “very significant.” Practical significance H0 : µ = 227 g versus Ha : µ ≠ 227 g What is the probability of drawing a random sample such as yours if H0 is true? x = 222g σ = 5g x − µ 222 − 227 z= = = −2 σ n 5 4 n=4 statistical significance doesn’t tell you about the magnitude of the effect, only that there is one. normal curve to the left of z is 0.0228. Sampling distribution An effect could be too small to be relevant. And with a large enough Thus, P-value = 2*0.0228 = 4.56%. sample size, significance can be reached even for the tiniest effect. σ/√n = 2.5 g sample average so different from likely to be due to chance alone because of random sampling. Statistical significance may not be practically important. That’s because From table A, the area under the standard 2.28% The probability of getting a random Statistical significance only says whether the effect observed is 2.28% A drug to lower temperature is found to reproducibly lower patient 217 µ is so low that we reject H0. The machine does need recalibration. 222 227 x, z = −2 µ ( H0 ) 232 237 temperature by 0.4° Celsius (P-value < 0.01). But clinical benefits of temperature reduction only appear for a 1°decrease or larger. 5 Don’t ignore lack of significance Consider this provocative title from the British Medical Journal: “Absence of evidence is not evidence of absence”. Having no proof of who committed a murder does not imply that the murder was not committed. Indeed, failing to find statistical significance in results is not rejecting the null hypothesis. This is very different from actually accepting it. The sample size, for instance, could be too small to overcome large variability in the population. When comparing two populations, lack of significance does not imply that the two samples come from the same population. They could represent two very distinct populations with similar mathematical properties. Interpreting effect size: It’s all about context There is no consensus on how big an effect has to be in order to be considered meaningful. In some cases, effects that may appear to be trivial can be very important. Example: Improving the format of a computerized test reduces the average response time by about 2 seconds. Although this effect is small, it is important since this is done millions of times a year. The cumulative time savings of using the better format is gigantic. Always think about the context. Try to plot your results, and compare them with a baseline or results from similar studies. 6 ...
View Full Document

This note was uploaded on 04/15/2010 for the course ECON ECON 2500 taught by Professor Siminseruy during the Spring '10 term at York University.

Ask a homework question - tutors are online