This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 1 Multiple String Alignment Efficient methods for multiple sequence alignment with guaranteed error bounds Dan Gusfield 1 Computer Science Division University of California, Davis July, 1991 Abstract Multiple string (sequence) alignment is a difficult and important problem in computa- tional biology, where it is central in two related tasks: finding highly conserved subregions or embedded patterns of a set of biological sequences (strings of DNA, RNA or amino acids), and inferring the evolutionary history of a set of taxa from their associated biolog- ical sequences. Several precise measures have been proposed for evaluating the goodness of a multiple alignment, but no efficient methods are known which compute the optimal alignment for any of these measures in any but small cases. In this paper, we consider two previously proposed measures, and give two computationaly efficient multiple alignment methods (one for each measure) whose deviation from the optimal value is guaranteed to be less than a factor of two. This is the novel feature of these methods, but the methods have additional virtues as well. For both methods, the guaranteed bounds are much smaller than two when the number of strings is small (1.33 for three strings of any length); for one of the methods we give a related randomized method which is much faster and which gives, with high probability, multiple alignments with fairly small error bounds; and for the other measure, the method given yields a non-obvious lower bound on the value of the optimal alignment. 2 Introduction Multiple string (sequence) alignment is a difficult problem of great value in computational biology, where it is central in two related tasks: finding highly conserved subregions or em- bedded patterns of a set of biological sequences (strings of DNA, RNA or amino acids); and inferring the evolutionary history of a set of taxa from their associated biological sequences. In the first case, a conserved pattern may be so dissimilar or dispersed in the strings that it cannot be detected by statistical tests when just two strings of the set are aligned, but the pattern becomes clear and compelling when many strings are simultaneously aligned. Scores of papers have been written on methods for multiple string alignment, and hundreds 1 Research partially supported by grant DE-FG03-90ER60999 from the Department of Energy, and grant CCR-8803704 from the National Science Foundation. 1 of papers have used various multiple alignment methods to find patterns or build evolu- tionary trees from biological sequence data. The following few papers illustrate this broad literature: [6, ? , 2, 4, ? , 11, 15, ? , 1, 10, 3]. Many of the suggested methods build a multiple alignment by attempting to optimize some explicitly or implicitly stated measure of goodness of the alignment. However, no single measure or objective function has yet been proposed that is widely agreed upon (unlike the case of aligning just two strings), and some proposed methods build alignments...
View Full Document
This note was uploaded on 05/20/2011 for the course CAP 5515 taught by Professor Ungor during the Spring '08 term at University of Florida.
- Spring '08