
Home

Text Table of Contents

Assignments

Calculator

Tools

Review

Glossary

Bibliography

System Requirements

Author's Homepage
Chapter 21
Testing Equality of Two Percentages
Chapter 19, "Hypothesis Testing: Does Chance Explain the Results?" introduced a conceptual
framework for statistical hypothesis testing. Chapter 20, "Does Treatment Have an Effect?,"
presented important statistical considerations for determining whether a
treatment
has an effect.
Treatment is meant looselyit could be a drug, an advertising campaign, a car wax, a test
preparation course, a fertilizer,
etc
. The best way to determine whether a treatment has an effect
is to use the
method of comparison
in an
experiment
in which
subjects
are assigned at random to
the
treatment group
or the
control group
.
When the measurement of each subject can be represented by 0 or 1 (
e.g.
, subject's condition
improves or not, subject buys something or not, subject clicks a link or not, subject passes an
exam or not), deciding whether the treatment has an effect is essentially testing the
null
hypothesis
that two percentages are equalwhich is the problem this chapter addresses.
Different ways of drawing samples lead to different tests. In one sampling design (the
randomization model
), the entire collection of subjects is allocated randomly between treatment
and control, which makes the samples
dependent
.
Conditioning
on the total number of ones in
the treatment and control groups leads to
Fisher's exact test
, which is based on the
hypergeometric distribution
of the number of ones in the treatment group if the null hypothesis is
true. When the sample sizes are large, calculating the rejection region for Fisher's Exact Test is
cumbersome, but the
normal approximation
to the hypergeometric distribution gives an
approximate test
a test whose
significance level
is approximately what it claims to be.
In a second sampling design (the
population model
), the two samples are
independent
random
samples with replacement from two populations; conditioning on the total number of ones in the
two samples again leads to Fisher's exact test, which can be approximated as before.
There is another approximate approach to testing the null hypothesis in the population model: If
the sample sizes are large (but the samples are drawn with replacement or are small compared to
the two population sizes), the normal approximation to the distribution of the difference between
the two
sample percentages
tends to be accurate. If the null hypothesis is true, the
expected value
of the difference between the sample percentages is zero, and the
SE
of the difference in sample
percentages can be estimated by pooling the two samples. That allows one to transform the
difference of sample percentages approximately into
standard units
, and to base an hypothesis
test on the normal approximation to the probability distribution of the approximately
standardized difference. Surprisingly, the resulting approximate test is essentially the normal