MSA_constrained - Efcient Constrained Multiple Sequence...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
EfFcient Constrained Multiple Sequence Alignment with Performance Guarantee Francis Y.L. Chin N.L. Ho T.W. Lam Prudence W.H. Wong M.Y. Chan Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong. { chin,nlho,twlam,whwong,mychan } @csis.hku.hk Abstract The Constrained Multiple Sequence Alignment problem is to align a set of sequences subject to a given constrained sequence, which arises from some knowledge of the struc- ture of the sequences. This paper presents new algorithms for this problem, which are more efFcient in terms of time and space (memory) than the previous algorithms [14], and with a worst-case guarantee on the quality of the alignment. Saving the space requirement by a quadratic factor is par- ticularly signiFcant as the previous O ( n 4 ) -space algorithm has limited application due to its huge memory requirement. Experiments on real data sets conFrm that our new algo- rithms show improvements in both alignment quality and resource requirements. 1. Introduction Multiple sequence alignment (MSA) is one of the prob- lems in computational biology that have been studied ex- tensively [2, 5, 8, 6, 10, 12, 13]. Roughly speaking, given a set of k 2 sequences, the MSA problem is to align similar subsequences in the same region. From the com- putational point of view, the optimal alignment of two se- quences can be found in O ( n 2 ) time, where n is the length of the longer sequence. Yet, for three or more sequences, it has been proved that ±nding the optimal alignment is NP- hard, i.e., intractable 1 [2, 16]. In the literature, there are a number of MSA algorithms that attempt to approximate the optimal alignments, some of them can provide a worst- This research was supported in part by Hong Kong RGC Grant HKU7019/00E This research was supported in part by Hong Kong RGC Grant 10204076 1 There are several possible ways to de±ne the optimal alignment. In this paper we adopt the widely-used Sum-of-Pair (SP) score, which asks for an alignment that minimizes the sum of the alignment cost of all pairs of sequences. case approximation ratio [1, 4, 11], while some others work well in practice [9, 15]. Notice that with all these algo- rithms, users (biologists) can only control the alignment re- sults by adjusting parameters like the scoring function and gap penalty. In other words, users could not incorporate their knowledge of the functionalities or structures of the input sequences, which is indeed very useful for accurate and biologically meaningful alignment. This naturally trig- gers the studies of sequence alignment that allows users to provide additional constraints. Tang et al. [14] were the ±rst to investigate the MSA problem with an additional input of a constrained sequence, which imposes a structure on the alignment by requiring every character in the constrained sequence to appear in an entire column in the alignment of the multiple sequences.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 10

MSA_constrained - Efcient Constrained Multiple Sequence...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online