This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: The Bonferonni and Šidák Corrections for Multiple Comparisons Hervé Abdi 1 1 Overview The more tests we perform on a set of data, the more likely we are to reject the null hypothesis when it is true ( i.e., a “Type I” error). This is a consequence of the logic of hypothesis testing: We reject the null hypothesis if we witness a rare event. But the larger the number of tests, the easier it is to find rare events and therefore the easier it is to make the mistake of thinking that there is an ef- fect when there is none. This problem is called the inflation of the alpha level. In order to be protected from it, one strategy is to cor- rect the alpha level when performing multiple tests. Making the alpha level more stringent ( i.e., smaller) will create less errors, but it may also make it harder to detect real effects. 2 The different meanings of alpha Maybe it is because computers make it easier to run statistical analy- ses that researchers perform more and more statistical tests on a 1 In: Neil Salkind (Ed.) (2007). Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. Address correspondence to: Hervé Abdi Program in Cognition and Neurosciences, MS: Gr.4.1, The University of Texas at Dallas, Richardson, TX 75083–0688, USA E-mail: [email protected] http://www.utd.edu/ ∼ herve 1 Hervé Abdi: The Bonferonni and Šidák Corrections same set of data. For example, brain imaging researchers will rou- tinely run millions of tests to analyze an experiment. Running so many tests increases the risk of false alarms. To illustrate, imagine the following “pseudo-experiment": I toss 20 coins, and I try to force the coins to fall on the heads. I know that, from the “binomial test," the null hypothesis is rejected at the α = .05 level if the number of heads is greater than 14. I repeat this experiment 10 times. Suppose that one trial gives the “significant" result of 16 heads versus 4 tails. Did I influence the coins on that occasion? Of course not, because the larger the number of experiments, the greater the probability of detecting a low-probability event (like 16 versus 4). In fact, waiting long enough is a sure way of detecting rare events! 2.1 Probability in the family A family of tests is the technical term for a series of tests performed on a set of data. In this section we show how to compute the prob- ability of rejecting the null hypothesis at least once in a family of tests when the null hypothesis is true. For convenience, suppose that we set the significance level at α =.05. For each test ( i.e. , one trial in the example of the coins) the probability of making a Type I error is equal to α = .05. The events “making a Type I error" and “not making a Type I error" are com- plementary events (they cannot occur simultaneously). Therefore the probability of not making a Type I error on one trial is equal to 1- α = 1- .05 = .95 ....
View Full Document
- Summer '08
- Null hypothesis, Statistical hypothesis testing, Statistical significance, Multiple comparisons, Hervé Abdi, Šidák Corrections