Preprint
Alignment of RNA Base Pairing Probability
Matrices
Ivo L. Hofacker, Stephan H.F. Bernhart and Peter F. Stadler
Institut f¨ur Theoretische Chemie und Molekulare Strukturbiologie, Universit¨at Wien,
W¨ahringerstraße 17, Vienna, A1090, Austria and Bioinformatik, Institut f¨ur Informatik,
Universit¨at Leipzig, Kreuzstrasse 7b, Leipzig, D04103, Germany
ABSTRACT
Motivation:
Many classes of functional RNA molecules
are characterized by highly conserved secondary struc
tures but little detectable sequence similarity. Reliable
multiple alignments can therefore be constructed only
when the shared structural features are taken into ac
count. Since multiple alignments are used as input for
many subsequent methods of data analysis, structure
based alignments are an indispensable necessity in RNA
bioinformatics.
Results:
We present here a method to compute pairwise
and progressive multiple alignments from the direct com
parison of basepairing probability matrices. Instead of at
tempting to solve the folding and the alignment problem
simultaneously as in the classical Sankoff algorithm we
use McCaskill’s approach to compute base pairing proba
bility matrices which effectively incorporate the information
on the energetics of each sequences. A novel, simplified
variant of Sankoff’s algorithms can then be employed to
extract the maximum weight common secondary structure
and an associated alignment.
Availability:
The programs
pmcomp
and
pmmulti
de
scribed in this contribution are implemented in
Perl
, and
are available on request from the authors. A web server is
available at
http://rna.tbi.univie.ac.at/cgibin/pmcgi.pl
Contact:
Ivo L. Hofacker,
Tel: ++43 1 4277 52738, Fax: ++43 1 4277 52793,
[email protected]
INTRODUCTION
Many functional classes of RNA molecules, including
tRNA, rRNA, RNAse P RNA, SRP RNA, exhibit a
highly conserved secondary structure but little sequence
homology. Reliable alignments thus have to take structural
information into account.
Sankoff’s algorithm (Sankoff, 1985) that simultaneously
allows the solution of the structure prediction and align
ment problem is computationally very expensive,
O
(
n
6
)
in CPU time and
O
(
n
4
)
in memory for a pair of sequences
of length
n
. A further complication is that it requires the
implementation of the full loopbased RNA energy model
(Mathews
et al.
, 1999). Currently available software pack
ages such as
foldalign
(Gorodkin
et al.
, 1997) and
dynalign
(Mathews & Turner, 2002) therefore imple
ment only restricted versions.
In this contribution we describe a different approach.
Instead of attempting to solve the alignment and the
structure prediction problem simultaneously we start from
base pairing probability matrices predicted by means of
McCaskill’s algorithm (McCaskill, 1990) (implemented
in the
RNAfold
program of
Vienna RNA Package
(Hofacker
et al.
, 1994; Hofacker, 2003). The problem then
becomes the alignment of the base pairing probability
matrices. This appears to be an even harder threading
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 Dr.Ping
 DNA, RNA, Probability Matrices, Ivo L. Hofacker, Peter F. Stadler

Click to edit the document details