biological groups in gene expression data

biological groups in gene expression data - BIOINFORMATICS...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
BIOINFORMATICS ORIGINAL PAPER Vol. 25 no. 2 2009, pages 211–217 doi:10.1093/bioinformatics/btn592 Gene expression LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data Maureen A. Sartor 1 , George D. Leikauf 2 and Mario Medvedovic 3 , 4 , 1 Center for Computational Medicine and Biology, University of Michigan, Ann Arbor, MI, 2 Department Environmental and Occupational Health, University of Pittsburgh, Pittsburgh, PA, 3 Department of Environmental Health and 4 Center for Environmental Genetics, University of Cincinnati, Cincinnati, OH, USA Received on June 06, 2008; revised on October 13, 2008; accepted on November 11, 2008 Advance Access publication November 27, 2008 Associate Editor: Trey Ideker ABSTRACT Motivation: The elucidation of biological pathways enriched with differentially expressed genes has become an integral part of the analysis and interpretation of microarray data. Several statistical methods are commonly used in this context, but the question of the optimal approach has still not been resolved. Results: We present a logistic regression-based method (LRpath) for identifying predefined sets of biologically related genes enriched with (or depleted of) differentially expressed transcripts in microarray experiments. We functionally relate the odds of gene set membership with the significance of differential expression, and calculate adjusted P -values as a measure of statistical significance. The new approach is compared with Fisher’s exact test and other relevant methods in a simulation study and in the analysis of two breast cancer datasets. Overall results were concordant between the simulation study and the experimental data analysis, and provide useful information to investigators seeking to choose the appropriate method. LRpath displayed robust behavior and improved statistical power compared with tested alternatives. It is applicable in experiments involving two or more sample types, and accepts significance statistics of the investigator’s choice as input. Availability: An R function implementing LRpath can be downloaded from http://eh3.uc.edu/lrpath. Contact: mario.medvedovic@uc.edu Supplementary information: Supplementary data are available at Bioinformatics online and at http://eh3.uc.edu/lrpath. 1 INTRODUCTION The identification of predefined sets of biologically related genes (gene sets) enriched with differentially expressed genes (DEGs) (Tavazoie et al. , 1999) has become a routine part of the analysis and interpretation of microarray data (Curtis et al. , 2005). Sets of genes associated with the same Gene Ontology (GO) term (Ashburner et al. , 2000; Harris et al. , 2004) or the same KEGG pathway (Kanehisa et al. , 2006) are two commonly used collections of such predefined groups. The most commonly used approach to identifying enriched sets of genes is based on counting the number of genes in such a set that are also differentially expressed. The statistical significance of such To whom correspondence should be addressed.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/06/2010 for the course COMPUTER S COSC1520 taught by Professor Paul during the Spring '09 term at York University.

Page1 / 7

biological groups in gene expression data - BIOINFORMATICS...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online