1 1 Gene Set Testing (Continued) Peng Liu 4/22/2008 2 Procedures of SAFE 1. Calculate a “local” statistics for each gene to measure the significance. Such statistics can be an ordinary t-statistic for comparing two treatments. 2. Calculate a “global” statistic for the gene set to be tested. The global statistic measures the difference between the local statistics in the interesting gene set and the local statistics in the complement of the gene set. 3 Procedures of SAFE 3. Use permutation to assess the significance of the global statistic for the gene set. Permute the labels of the treatment and calculate the global statistic for permuted data. Compare the observed global statistic with the global statistics from the permuted data and calculate the p-values. 4. Estimate the multiple testing error (FWER or FDR) using the p-values for the observed data and permuted data. 4 Global statistics in SAFE The global statistics assesses how the distribution of local statistics within a category differs from local statistics outside the category. SAFE procedure chooses rank-invariant choices for global statistics, such as the Wilcoxon rank sum. 5 Wilcoxon rank sum To calculate the Wilcoxon rank sum statistic, we first rank all genes according to the local statistics. Then Wilcoxon rank sum is calculated as: where W is the sum of ranks for genes in the interesting gene set, N is the total number of genes and g is the number of genes in the interesting gene set. 12 / ) 1 )( ( 2 / ) 1 ( + + = N g N g N g W Z W 6 GSEA GSEA is similar to SAFE in that it also follows the same procedure: calculate some statistics for individual genes and rank accordingly, calculate an enrichment score for the gene set use permutation to estimate the significance estimate multiple testing error

