LCS.efficient(1) - An Efficient Algorithm for the LCS...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
An Efficient Algorithm for the LCS Problem
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Longest Common Subsequence Problem The longest common subsequence problem, also called the LCS problem is a special case of the similarity problem. Definition : Given a string S of length n , a subsequence is a string such that for some . A substring is a subset of S which are located contiguously but in a subsequence the characters are not necessarily contiguous but they are in order from left to right. Thus a substring is a subsequence but the converse is not true. ) ( ).... ( ) ( 2 1 k i S i S i S k i i i i 3 2 1 1 n k
Background image of page 2
Longest Common Subsequence Definition: The longest common subsequence or LCS of two strings S1 and S2 is the longest subsequence common between two strings. S1 : A -- A T -- G G C C -- A T A n=10 S2: A T A T A A T T C T A T -- m=12 The LCS is AATCAT . The length of the LCS is 6 . The solution is not unique for all pair of strings . Consider the pair ( ATTA, ATAT ). The solutions are ATT, ATA . In general, for arbitrary pair of strings, there may exist many solutions.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
LCS Problem The LCS can be found by dynamic programming formulation. Since it is using the general dynamic programming algorithm its complexity is O(nm) . A longest substring problem, on the other hand has a O(n+m) solution. Subsequences are much more complex than substrings. Can we do better for the LCS problem? We will see
Background image of page 4
LCS for S 1 and S 2 The optimal alignment is shown above. Note the alignment shows three insert (dark), one delete ( green ) and three substitution or replacement operations ( blue ), which gives an edit distance of 7 . But, the 3 replacement operations can be realized by 3 insert and 3 delete operations because a replacement is equivalent to first delete the character and then insert a character in its place like: S 1 : A -- A T -- G G C C -- A T A n=10 S 2 : A T A T A A T T C T A T -- m=12 G -- G -- C -- -- A -- T -- T
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Distance and LCS are related if we give a cost of 2 for replace operation and cost of 1 for both insert and delete operations, the minimum edit distance D can be computed in terms of the length L of LCS as: For the above example, n= 10, m= 12, L =6. So, D= 10 ( 6 insert and 4 delete). L
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/12/2011 for the course CAP 5510 taught by Professor Staff during the Spring '08 term at University of Central Florida.

Page1 / 23

LCS.efficient(1) - An Efficient Algorithm for the LCS...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online