Lecture 14: Statistics II
• Multiple hypothesis testing
• Wilcoxon rank sum test
• Permutation tests
• Introduction to DNA microarray technology
Hypergeometric Distribution: Example
§ Used DNA microarray technology to identify genes whose expression is
increased in two different cancer cell lines (prostate cancer, leukemia)
§ The microarray you use contains probes for 6064 human genes.
§ You find that 105 genes are upregulated in the prostate cancer cell line,
and 180 genes are upregulated in the leukemia cell line.
§ The two data sets share 64 genes in common.
§ Is the observed overlap (64 genes) significantly greater then the overlap
expected due to random chance?
§ Null hypothesis: Observed overlap =< expected (random) overlap
§ Alternative hypothesis: Observed overlap > expected (random) overlap
§ Decision rule:
pvalue <
!
= 0.001, reject null hypothesis
Hypergeometric Example: Comparing Microarray Data Sets
Human Microarray
(6064)
Leukemia
(180)
Prostate
(105)
64
116
41
N
= 6064
k
= 180
n
= 105
m
= 64
P
= 7.6 x 10
75
P
=
1
"
k
x
#
$
%
’
(
N
"
k
n
"
x
#
$
%
’
(
N
n
#
$
%
’
(
x
=
0
m
"
1
)
=
1
"
180
x
#
$
%
’
(
5884
105
"
x
#
$
%
’
(
6064
105
#
$
%
’
(
x
=
0
63
)
= 7.6 x 10
75
p < 0.001
Multiple Hypothesis Testing
§ For hypothesis testing, we compare the significance level (pvalue) calculated
from the statistical test to a cutoff
!
(type I error)
•
!
is typically 0.05 or 0.01
§ Decision Rule:
• If the pvalue <
!
, then reject null hypothesis
• If the pvalue >
!
, then accept the null hypothesis
§ If we test one hypothesis using an
!
= 0.05, what is the probability of
mistakenly rejecting the null hypothesis?
§ If we test five hypotheses using an
!
= 0.05, what is the probability of
mistakenly rejecting the null hypothesis?
Chance of false positive = 1  (0.95)
N
= 1  (0.95)
5
= 1  0.774 = 0.226
N = number of tests
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document§ To correct for the error due to multiple hypothesis testing we use the
Bonferroni Correction:
• Where k = number of independent significance tests
Bonferroni Correction for Multiple Hypothesis Testing
"
'
=
k
§ For example if we test whether the set of genes upregulated in prostate
cancer have a significant overlap with 10 other cancer data sets, then
k = 10.
• If
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '11
 Staff
 Statistics, Null hypothesis, Statistical hypothesis testing, DNA microarray

Click to edit the document details