19585577-All-About-Statistical-Significance-and-Testing

19585577-All-About-Statistical-Significance-and-Testing -...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Null Hypothesis (1 of 4) The null hypothesis is an hypothesis about a population parameter. The purpose of hypothesis testing is to test the viability of the null hypothesis in the light of experimental data. Depending on the data, the null hypothesis either will or will not be rejected as a viable possibility. Consider a researcher interested in whether the time to respond to a tone is affected by the consumption of alcohol. The null hypothesis is that µ 1 - µ 2 = 0 where µ 1 is the mean time to respond after consuming alcohol and µ 2 is the mean time to respond otherwise. Thus, the null hypothesis concerns the parameter µ 1 - µ 2 and the null hypothesis is that the parameter equals zero. The null hypothesis is often the reverse of what the experimenter actually believes; it is put forward to allow the data to contradict it. In the experiment on the effect of alcohol, the experimenter probably expects alcohol to have a harmful effect. If the experimental data show a sufficiently large effect of alcohol, then the null hypothesis that alcohol has no effect can be rejected. !" # !"# $% & $' ( " !"# )% ( )' * * * * && + " * * * * , /"" . # !"# ) ( /"" . 012 !"# 1 ( " # !"# )(" !"# )(%" !"# )% & )' ( " !"# 3 ( 4 !"# 3% & 3' ( " !"# )% ( )' ( )5 !"# 1%& 1'( " 6 & " )% & )' 9 " " . & )% & ) ' 7 " 8% & 8' # Steps in Hypothesis Testing (1 of 5) The basic logic of hypothesis testing has been presented somewhat informally in the sections on "Ruling out chance as an explanation" and the "Null hypothesis." In this section the logic will be presented in more detail and more formally. 1. The first step in hypothesis testing is to specify the null hypothesis (H0) and the alternative hypothesis (H1). If the research concerns whether one method of presenting pictorial stimuli leads to better recognition than another, the null hypothesis would most likely be that there is no difference between methods (H0: 1 - 2 = 0). The alternative hypothesis would be H1: 1 2. If the research concerned the correlation between grades and SAT scores, the null hypothesis would most likely be that there is no correlation (H0: = 0). The alternative hypothesis would be H1: 0. 2. The next step is to select a significance level. Typically the 0.05 or the 0.01 level is used. 3. The third step is to calculate a statistic analogous to the parameter specified by the null hypothesis. If the null hypothesis were defined by the parameter 1- 2, then the statistic M1 - M2 would be computed. 4. The fourth step is to calculate the probability value (often called the p value). The p value is the probability of obtaining a statistic as different or more different from the parameter specified in the null hypothesis as the statistic computed from the data. The calculations are made assuming that the null hypothesis is true. (click here for a concrete example) 5. The probability value computed in Step 4 is compared with the significance level chosen in Step 2. If the probability is less than or equal to the significance level, then the null hypothesis is rejected; if the probability is greater than the significance level then the null hypothesis is not rejected. When the null hypothesis is rejected, the outcome is said to be "statistically significant" when the null hypothesis is not rejected then the outcome is said be "not statistically significant." 6. If the outcome is statistically significant, then the null hypothesis is rejected in favor of the alternative hypothesis. If the rejected null hypothesis were that 1- 2 = 0, then the alternative hypothesis would be that 1 2. If M1 were greater than M2 then the researcher would naturally conclude that 1 2. (Click here to see why you can conclude more than 1 2) 7. The final step is to describe the result and the statistical conclusion in an understandable way. Be sure to present the descriptive statistics as well as whether the effect was significant or not. For example, a significant difference between a group that received a drug and a control group might be described as follow: Subjects in the drug group scored significantly higher (M = 23) than did subjects in the control group (M = 17), t(18) = 2.4, p = 0.027. The statement that "t(18) =2.4" has to do with how the probability value (p) was calculated. A small minority of researchers might object to two aspects of this wording. First, some believe that the significance level rather than the probability level should be reported. The argument for reporting the probability value is presented in another section. Second, since the alternative hypothesis was stated as µ 1 µ 2, some might argue that it can only be concluded that the population means differ and not that the population mean for the drug group is higher than the population mean for the control group. This argument is misguided. Intuitively, there are strong reasons for inferring that the direction of the difference in the population is the same as the difference in the sample. There is also a more formal argument. A non significant effect might be described as follows: Although subjects in the drug group scored higher (M = 23) than did subjects in the control group, (M = 20), the difference between means was not significant, t(18) = 1.4, p = 0.179. It would not have been correct to say that there was no difference between the performance of the two groups. There was a difference. It is just that the difference was not large enough to rule out chance as an explanation of the difference. It would also have been incorrect to imply that there is no difference in the population. Be sure not to accept the null hypothesis. . * ): * Why the Null Hypothesis is Not Accepted (1 of 5) A null hypothesis is not accepted just because it is not rejected. Data not sufficient to show convincingly that a difference between means is not zero do not prove that the difference is zero. Such data may even suggest that the null hypothesis is false but not be strong enough to make a convincing case that the null hypothesis is false. For example, if the probability value were 0.15, then one would not be ready to present one's case that the null hypothesis is false to the (properly) skeptical scientific community. More convincing data would be needed to do that. However, there would be no basis to conclude that the null hypothesis is true. It may or may not be true, there just is not strong enough evidence to reject it. Not even in cases where there is no evidence that the null hypothesis is false is it valid to conclude the null hypothesis is true. If the null hypothesis is that µ 1 - µ 2 is zero then the hypothesis is that the difference is exactly zero. No experiment can distinguish between the case of no difference between means and an extremely small difference between means. If data are consistent with the null hypothesis, they are also consistent with other similar hypotheses. Significance Test (1 of 2) A significance test is performed to determine if an observed value of a statistic differs enough from a hypothesized value of a parameter to draw the inference that the hypothesized value of the parameter is not the true value. The hypothesized value of the parameter is called the "null hypothesis." A significance test consists of calculating the probability of obtaining a statistic as different or more different from the null hypothesis (given that the null hypothesis is correct) than the statistic obtained in the sample. If this probability is sufficiently low, then the difference between the parameter and the statistic is said to be "statistically significant." Just how low is sufficiently low? The choice is somewhat arbitrary but by convention levels of 0.05 and 0.01 are most commonly used. For instance, an experimenter may hypothesize that the size of a food reward does not affect the speed a rat runs down an alley. One group of rats receives a large reward and another receives a small reward for running the alley. Suppose the mean running time for the large reward were 1.5 seconds and the mean running time for the small reward were 2.1 seconds. '%&%4("/ "/ 0 2 Significance Level In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis. The significance level is used in hypothesis testing as follows: First, the difference between the results of the experiment and the null hypothesis is determined. Then, assuming the null hypothesis is true, the probability of a difference that large or larger is computed . Finally, this probability is compared to the significance level. If the probability is less than or equal to the significance level, then the null hypothesis is rejected and the outcome is said to be statistically significant. Traditionally, experimenters have used either the 0.05 level (sometimes called the 5% level) or the 0.01 level (1% level), although the choice of levels is largely subjective. The lower the significance level, the more the data must diverge from the null hypothesis to be significant. Therefore, the 0.01 level is more conservative than the 0.05 level. The Greek letter alpha ( ) is sometimes used to indicate the significance level. See also: Type I error and significance test. Why the Null Hypothesis is Not Accepted (1 of 5) A null hypothesis is not accepted just because it is not rejected. Data not sufficient to show convincingly that a difference between means is not zero do not prove that the difference is zero. Such data may even suggest that the null hypothesis is false but not be strong enough to make a convincing case that the null hypothesis is false. For example, if the probability value were 0.15, then one would not be ready to present one's case that the null hypothesis is false to the (properly) skeptical scientific community. More convincing data would be needed to do that. However, there would be no basis to conclude that the null hypothesis is true. It may or may not be true, there just is not strong enough evidence to reject it. Not even in cases where there is no evidence that the null hypothesis is false is it valid to conclude the null hypothesis is true. If the null hypothesis is that µ 1 - µ 2 is zero then the hypothesis is that the difference is exactly zero. No experiment can distinguish between the case of no difference between means and an extremely small difference between means. If data are consistent with the null hypothesis, they are also consistent with other similar hypotheses. $%& $' ( " $%& $' ( " ""% 6 8 6 ; < & . , $' $ % & $' $% Why the Null Hypothesis is Not Accepted (4 of 5) Assume the experiment measured "well being" on a 50 point scale (with higher scores representing more well being) that has a standard deviation of 10. Further assume the 99% confidence interval computed from the experimental data was: -0.5 µ 1- µ 2 1 This says that one can be confident that the mean "true" drug treatment effect is somewhere between -0.5 and 1. If it were -0.5 then the drug would, on average, be slightly detrimental; if it were 1 then the drug would, on average, be slightly beneficial. But, how much benefit is an average improvement of 1? Naturally that is a question that involves characteristics of the measurement scale. But, since 1 is only 0.10 standard deviations, it can be presumed to be a small effect. The overlap between two distributions whose means differ by 0.10 standard deviations is shown below. Although the blue distribution is slightly to the right of the red distribution, the overlap is almost complete. 0 ==> 2 # '&5 *< ? * The Precise Meaning of the Probability Value (1 of 3) There is often confusion about the precise meaning of the probability computed in a significance test. As stated in Step 4 of the steps in hypothesis testing, the null hypothesis (H0) is assumed to be true. The difference between the statistic computed in the sample and the parameter specified by H0 is computed and the probability of obtaining a difference this large or large is calculated. This probability value is the probability of obtaining data as extreme or more extreme than the current data (assuming H0 is true). It is not the probability of the null hypothesis itself. Thus, if the probability value is 0.005, this does not mean that the probability that the null hypothesis is true is .005. It means that the probability of obtaining data as different or more different from the null hypothesis as those obtained in the experiment is 0.005. The inferential step to conclude that the null hypothesis is false goes as follows: The data (or data more extreme) are very unlikely given that the null hypothesis is true. This means that: (1) a very unlikely event occurred or (2) the null hypothesis is false. The inference usually made is that the null hypothesis is false. The Precise Meaning of the Probability Value (2 of 3) To illustrate that the probability is not the probability of the hypothesis, consider a test of a person who claims to be able to predict whether a coin will come up heads or tails. One should take a rather skeptical attitude toward this claim and require strong evidence to believe in its validity. The null hypothesis is that the person can predict correctly half the time (H0: = 0.5). In the test, a coin is flipped 20 times and the person is correct 11 times. If the person has no special ability (H0 is true), then the probability of being correct 11 or more times out of 20 is 0.41. Would someone who was originally skeptical now believe that there is only a 0.41 chance that the null hypothesis is true? They almost certainly would not since they probably originally thought H0 had a very high probability of being true (perhaps as high as 0.9999). There is no logical reason for them to decrease their belief in the validity of the null hypothesis since the outcome was perfectly consistent with the null hypothesis. The Precise Meaning of the Probability Value (3 of 3) The proper interpretation of the test is as follows: A person made a rather extraordinary claim and should be able to provide strong evidence in support of the claim if the claim is to believed. The test provided data consistent with the null hypothesis that the person has no special ability since a person with no special ability would be able to predict as well or better more than 40% of the time. Therefore, there is no compelling reason to believe the extraordinary claim. However, the test does not prove the person cannot predict better than chance; it simply fails to provide evidence that he or she can. The probability that the null hypothesis is true is not determined by the statistical analysis conducted as part of hypothesis testing. Rather, the probability computed is the probability of obtaining data as different or more different from the null hypothesis (given that the null hypothesis is true) as the data actually obtained. . 02 @ " "4 " "A' " "4 " "4 " "4% " """% " """% " "A' " "A% ! " AB' " "4 8 6 " """% 2 " "A'; 0 . 0 6 2 0%===2 Statistical and Practical Significance (1 of 4) It is important not to confuse the confidence with which the null hypothesis can be rejected with size of the effect. To make this point concrete, consider a researcher assigned the task of determining whether the video display used by travel agents for booking airline reservations should be in color or in black and white. Market research had shown that travel agencies were primarily concerned with the speed with which reservations can be made. Therefore, the question was whether color displays allow travel agents to book reservations faster. Market research had also shown that in order to justify the higher price of color displays, they must be faster by an average of at least 10 seconds per transaction. Fifty subjects were tested with color displays and 50 subjects were tested with black and white displays. Subjects were slightly faster at making reservations on a color display (M = 504.7 seconds) than on a black and white display (M = 508.2) seconds. although the difference is small, it was statistically significant at the .05 significance level. Box plots of the data are shown below. =4> # &C " D ) &) ? D &" % C" "% E 0%" < =4> 2 0 2 ,, %" %"" 08 ( 4"A C 2 " "% , , 08 ( 4"B % " "4 * , %"" 54 2 + * 5A , 4" , The 95% confidence interval on the difference between means is: -5.8 color - black & white -0.9 and the 99% interval is: -6.6 color - black & white -0.1 Therefore, despite the finding of a "more significant" difference between means, the experimenter can be even more certain that the color displays are only slightly better than the black and white displays. The second experiment shows conclusively that the difference is less than 10 seconds. This example was used to illustrate the following points: (1) an effect that is statistically significant is not necessarily large enough to be of practical significance and (2) the smaller of two effects can be "more significant" than the larger. Be careful how you interpret findings reported in the media. If you read that a particular diet lowered cholesterol significantly, this does not necessarily mean that the diet lowered cholesterol enough to be of any health value. It means that the effect on cholesterol in the population is greater than zero. Type I and II errors (1 of 2) There are two kinds of errors that can be made in significance testing: (1) a true null hypothesis can be incorrectly rejected and (2) a false null hypothesis can fail to be rejected. The former error is called a Type I error and the latter error is called a Type II error. These two types of errors are defined in the table. True State of the Null Hypothesis H0 True Type I error Correct H0 False Correct Type II error Statistical Decision Reject H0 Do not Reject H0 The probability of a Type I error is designated by the Greek letter alpha (α) and is called the Type I error rate; the probability of a Type II error (the Type II error rate) is designated by the Greek letter beta (ß) . A Type II error is only an error in the sense that an opportunity to reject the null hypothesis correctly was lost. It is not an error in the sense that an incorrect conclusion was drawn since no conclusion is drawn when the null hypothesis is not rejected. . 0F2 . G ! "4 "4 "% "% One- and Two-Tailed Tests (1 of 4) In the section on "Steps in hypothesis testing" the fourth step involves calculating the probability that a statistic would differ as much or more from parameter specified in the null hypothesis as does the statistic obtained in the experiment. This statement implies that a difference in either direction would be counted. That is, if the null hypothesis were: H0: =0 and the value of the statistic M1- M2 were +5, then the probability of M1- M2 differing from zero by five or more (in either direction) would be computed. In other words, probability value would be the probability that either M1- M2 5 or M1- M2 -5. Assume that the figure shown below is the sampling distribution of M1- M2. The figure shows that the probability of a value of +5 or more is 0.036 and that the probability of a value of -5 or less is .036. Therefore the probability of a value either greater than or equal to +5 or less than or equal to -5 is 0.036 + 0.036 = 0.072. . * & * )% & )' )% & )' , , , , ! 6 )% & )' * & * , & & , 8%& 8' & " "5/ & & & " "4 & & & & & & & 0" "5/2 0" "C'2 H & & . - & & I Confidence Intervals & Hypothesis Testing (1 of 5) There is an extremely close relationship between confidence intervals and hypothesis testing. When a 95% confidence interval is constructed, all values in the interval are considered plausible values for the parameter being estimated. Values outside the interval are rejected as relatively implausible. If the value of the parameter specified by the null hypothesis is contained in the 95% interval then the null hypothesis cannot be rejected at the 0.05 level. If the value specified by the null hypothesis is not in the interval then the null hypothesis can be rejected at the 0.05 level. If a 99% confidence interval is constructed, then values outside the interval are rejected at the 0.01 level. Imagine a researcher wishing to test the null hypothesis that the mean time to respond to an auditory signal is the same as the mean time to respond to a visual signal. The null hypothesis therefore is: visual - auditory = 0. Ten subjects were tested in the visual condition and their scores (in milliseconds) were: 355, 421, 299, 460, 600, 580, 474, 511, 550, and 586. Ten subjects were tested in the auditory condition and their scores were: 275, 320, 278, 360, 430, 520, 464, 311, 529, and 326. The 95% confidence interval on the difference between means is: 9 196. visual - auditory Therefore only values in the interval between 9 and 196 are retained as plausible values for the difference between population means. Since zero, the value specified by the null hypothesis, is not in the interval, the null hypothesis of no difference between auditory and visual presentation can be rejected at the 0.05 level. The probability value for this example is 0.034. Any time the parameter specified by a null hypothesis is not contained in the 95% confidence interval estimating that parameter, the null hypothesis can be rejected at the 0.05 level or less. Similarly, if the 99% interval does not contain the parameter then the null hypothesis can be rejected at the 0.01 level. The null hypothesis is not rejected if the parameter value specified by the null hypothesis is in the interval since the null hypothesis would still be plausible. However, since the null hypothesis would be only one of an infinite number of values in the confidence interval, accepting the null hypothesis is not justified. There are many arguments against accepting the null hypothesis when it is not rejected. The null hypothesis is usually a hypothesis of no difference. Thus null hypotheses such as: 1 1 - 2 2 =0 =0 in which the hypothesized value is zero are most common. When the hypothesized value is zero then there is a simple relationship between hypothesis testing and confidence intervals: If the interval contains zero then the null hypothesis cannot be rejected at the stated level of confidence. If the interval does not contain zero then the null hypothesis can be rejected. This is just a special case of the general rule stating that the null hypothesis can be rejected if the interval does not contain the hypothesized value of the parameter and cannot be rejected if the interval contains the hypothesized value. , , )% )% )% & )' ( " )% & )' K " )% & )' 9 " 6 0 2 )'# ) % & )' ) % & )' ( " )' J " "4 J / D )% & )' D %4 , 8 )' 6 =4> # )% & )' ( " " "4 # )% K , , ...
View Full Document

This note was uploaded on 09/27/2009 for the course STATS 111 taught by Professor Al during the Spring '09 term at Tennessee Martin.

Ask a homework question - tutors are online