1
1
Gene Set Testing (Continued)
Peng Liu
4/22/2008
2
Procedures of SAFE
1.
Calculate a “local” statistics for each gene to measure
the significance.
Such statistics can be an ordinary t-statistic for comparing two
treatments.
2.
Calculate a “global” statistic for the gene set to be
tested.
The global statistic measures the difference between the local
statistics in the interesting gene set and the local statistics in
the complement of the gene set.
3
Procedures of SAFE
3.
Use permutation to assess the significance of the
global statistic for the gene set.
Permute the labels of the treatment and calculate the global
statistic for permuted data. Compare the observed global
statistic with the global statistics from the permuted data and
calculate the p-values.
4.
Estimate the multiple testing error (FWER or FDR)
using the p-values for the observed data and permuted
data.
4
Global statistics in SAFE
The global statistics assesses how the distribution of
local statistics within a category differs from local
statistics outside the category.
SAFE procedure chooses rank-invariant choices for
global statistics, such as the Wilcoxon rank sum.
5
Wilcoxon rank sum
To calculate the Wilcoxon rank sum statistic, we first
rank all genes according to the local statistics.
Then Wilcoxon rank sum is calculated as:
where
W
is the sum of ranks for genes in the interesting
gene set,
N
is the total number of genes and
g
is the
number of genes in the interesting gene set.
12
/
)
1
)(
(
2
/
)
1
(
+
−
+
−
=
N
g
N
g
N
g
W
Z
W
6
GSEA
GSEA is similar to SAFE in that it also follows
the same procedure:
calculate some statistics for individual genes and rank
accordingly,
calculate an enrichment score for the gene set
use permutation to estimate the significance
estimate multiple testing error

