Vol. 23 no. 15 2007, pages 2013–2014
Tree Gibbs Sampler: identifying conserved motifs without
aligning orthologous sequences
, Haiyan Hu
and Xiaoman Shawn Li
Division of Biostatistics and
Center for Computational Biology and Bioinformatics, School of Medicine,
Indiana University, 410 West 10th Street, Indianapolis, IN 46202, USA
Received on February 3, 2007; revised on April 17, 2007; accepted on May 18, 2007
Advance Access publication May 31, 2007
Associate Editor: Alfonso Valencia
Tree Gibbs Sampler is a software for identifying motifs by
simultaneously using the motif overrepresentation property and the
motif evolutionary conservation property. It identifies motifs without
depending on pre-aligned orthologous sequences, which makes it
useful for the extraction of regulatory elements in multiple genomes
of both closely related and distant species.
The Tree Gibbs Sampler software is freely down-
loadable at https://compbio.iupui.edu/xiaomanli/LiSoftware/retrieve.
A transcription factor can bind to short DNA segments in the
regulatory regions of many different genes to control their
expression. The common pattern of these short DNA segments
bound by a transcription factor is called a motif. Recently,
many computational methods have been developed to identify
motifs by finding overrepresented and conserved DNA
segments (putative motif instances) in the regulatory regions
of a set of candidate genes in multiple related species (Liu, 2004;
Moses, 2004; Prakash, 2004, 2005; Sinha, 2004; Wang, 2003).
Most of these methods align orthologous sequences first and
then identify motifs from the aligned orthologous sequences,
often without taking the species divergent time into account.
However, motif instances are not always aligned with their
counterpart motif instances in the multiple alignments of
orthologous sequences (Li, 2005). Moreover, without taking the
divergent time into account, one often cannot distinguish
the conserved segments due to the short divergent time from the
conserved segments due to the functionality. Here we developed
a useful software, Tree Gibbs Sampler (TGS), which identifies
motifs from unaligned orthologous sequences by taking the
divergent time into account properly.