Lecture 14: Statistics II
• Multiple hypothesis testing
• Wilcoxon rank sum test
• Permutation tests
• Introduction to DNA microarray technology
Hypergeometric Distribution: Example
§ Used DNA microarray technology to identify genes whose expression is
increased in two different cancer cell lines (prostate cancer, leukemia)
§ The microarray you use contains probes for 6064 human genes.
§ You find that 105 genes are upregulated in the prostate cancer cell line,
and 180 genes are upregulated in the leukemia cell line.
§ The two data sets share 64 genes in common.
§ Is the observed overlap (64 genes) significantly greater then the overlap
expected due to random chance?
§ Null hypothesis: Observed overlap =< expected (random) overlap
§ Alternative hypothesis: Observed overlap > expected (random) overlap
§ Decision rule:
pvalue <
!
= 0.001, reject null hypothesis
Hypergeometric Example: Comparing Microarray Data Sets
Human Microarray
(6064)
Leukemia
(180)
Prostate
(105)
64
116
41
N
= 6064
k
= 180
n
= 105
m
= 64
P
= 7.6 x 10
75
P
=
1
"
k
x
#
$
%
’
(
N
"
k
n
"
x
#
$
%
’
(
N
n
#
$
%
’
(
x
=
0
m
"
1
)
=
1
"
180
x
#
$
%
’
(
5884
105
"
x
#
$
%
’
(
6064
105
#
$
%
’
(
x
=
0
63
)
= 7.6 x 10
75
p < 0.001
Multiple Hypothesis Testing
§ For hypothesis testing, we compare the significance level (pvalue) calculated
from the statistical test to a cutoff
!
(type I error)
•
!
is typically 0.05 or 0.01
§ Decision Rule:
• If the pvalue <
!
, then reject null hypothesis
• If the pvalue >
!
, then accept the null hypothesis
§ If we test one hypothesis using an
!
= 0.05, what is the probability of
mistakenly rejecting the null hypothesis?
§ If we test five hypotheses using an
!
= 0.05, what is the probability of
mistakenly rejecting the null hypothesis?
Chance of false positive = 1  (0.95)
N
= 1  (0.95)
5
= 1  0.774 = 0.226
N = number of tests
View Full Document§ To correct for the error due to multiple hypothesis testing we use the
Bonferroni Correction:
• Where k = number of independent significance tests
Bonferroni Correction for Multiple Hypothesis Testing
"
'
=
k
§ For example if we test whether the set of genes upregulated in prostate
cancer have a significant overlap with 10 other cancer data sets, then
k = 10.
• If
