MOPAT - 44884497 Nucleic Acids Research, 2008, Vol. 36, No....

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
4488–4497 Nucleic Acids Research, 2008, Vol. 36, No. 13 Published online 7 July 2008 doi:10.1093/nar/gkn407 MOPAT: a graph-based method to predict recurrent cis -regulatory modules from known motifs Jianfei Hu 1,2 , Haiyan Hu 2,3 and Xiaoman Li 1,2, * 1 Division of Biostatistics, 2 School of Informatics and 3 Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, 410 West 10th Street, Indianapolis, IN 46202, USA Received January 26, 2008; Revised June 1, 2008; Accepted June 10, 2008 ABSTRACT The identification of cis -regulatory modules (CRMs) can greatly advance our understanding of eukaryo- tic regulatory mechanism. Current methods to pre- dict CRMs from known motifs either depend on multiple alignments or can only deal with a small number of known motifs provided by users. These methods are problematic when binding sites are not well aligned in multiple alignments or when the number of input known motifs is large. We thus developed a new CRM identification method MOPAT (motif pair tree), which identifies CRMs through the identification of motif modules, groups of motifs co-ccurring in multiple CRMs. It can iden- tify ‘orthologous’ CRMs without multiple align- ments. It can also find CRMs given a large number of known motifs. We have applied this method to mouse developmental genes, and have evaluated the predicted CRMs and motif modules by microar- ray expression data and known interacting motif pairs. We show that the expression profiles of the genes containing CRMs of the same motif module correlate significantly better than those of a random set of genes do. We also show that the known inter- acting motif pairs are significantly included in our predictions. Compared with several current methods, our method shows better performance in identifying meaningful CRMs. INTRODUCTION Identifying cis -regulatory modules (CRMs) is an impor- tant problem in this postgenomic era. CRMs are short DNA regions of a few hundred base pairs that contain multiple transcription factor-binding sites (TFBSs). It is estimated that there are ±ve-to-ten times as many CRMs in a genomes as there are genes (1). In high eukaryotes, CRMs instead of individual TFBSs often determine the spatial temporal expression patterns of neighboring genes. Therefore, identi±cation of the CRMs is important not only for the understanding of gene transcriptional regula- tion but also for the annotation of high eukaryotic genomes. However, to identify CRMs in high eukaryotes is chal- lenging. The difficulty lies in the following two facts. First, the possible residing regions of the CRMs in one gene can be as long as thousands of base pairs or even hundreds of thousands of base pairs. Second, the TFBSs are in general 6–14bp long and there is some degeneracy at almost every position of the TFBSs of a transcription factor (TF).
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 10

MOPAT - 44884497 Nucleic Acids Research, 2008, Vol. 36, No....

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online