{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# bio2 - Extending LCS for Biology 15-853:Algorithms in the...

This preview shows pages 1–4. Sign up to view the full content.

1 15-853 Page 1 15-853:Algorithms in the Real World Computational Biology II – Sequence Alignment – Database searches 15-853 Page 2 Extending LCS for Biology The LCS/Edit distance problem is not a “practical” model for comparing DNA or proteins. Why? Good example of the simple model failing. 15-853 Page 3 Extending LCS for Biology The LCS/Edit distance problem is not a “practical” model for comparing DNA or proteins. Some amino-acids are “closer” to each others than others (e.g. more likely to mutate among each other, or closer in structural form). Some amino-acids have more “information” than others and should contribute more. The cost of a deletion (insertion) of length n should not be counted as n times the cost of a deletion (insertion) of length 1. Biologist often care about finding “local” alignments instead of a global alignment. 15-853 Page 4 What we will talk about today Extensions Sequence Alignment : a generalization of LCS to account for the closeness of different elements Gap Models : More sophisticated models for accounting for the cost of adjacent insertions or deletions Local Alignment : Finding parts of one sequence in parts of another sequence. Applications FASTA and BLAST : The most common sequence matching tools used in Molecular Biology.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 15-853 Page 5 Sequence Alignment A generalization of LCS / Edit Distance Extension : A’ is an extension of A if it is A with spaces _ added. Alignment : An alignment of A and B is a pair of extensions A’ and B’ such that |A’| = |B’| Example : A = a b a c d a B = a a d c d d c A’ = _ a b a c d a _ B’ = a a d _ c d d c 15-853 Page 6 The Score (Weight) Σ + = alphabet including a “space” character Scoring Function : σ (x,y), x,y ∈ Σ + Alignment score : Optimal alignment : An alignment (A’, B’) of (A, B) such that W(A’,B’) is maximized . We will denote this optimized score as W(A,B). Same as |LCS| when: ( ) = = | ' | .. 1 ' , ' ) ' , ' ( A i i i B A B A W σ = = otherwise 0 _ if 1 ) , ( y x y x σ 15-853 Page 7 Example A = a b a c d a c B = c a d c d d c Alignment 1 _ a b a c d a c | | | | c a d _ c d d c Alignment 2 a b a _ c d a c | | | | _ c a d c d d c -1 -1 -1 -1 -1 _ -1 2 0 0 0 d -1 0 2 1 0 c -1 0 1 2 0 b -1 0 0 0 2 a _ d c b a σ (x,y) Which is the better alignment? 6 7 15-853 Page 8 Scores vs. Distances Maximizing vs. Minimizing. Scores : Can be positive, zero, or negative. We try to maximize scores. Distances : Must be non-negative, and typically we assume they obey the triangle inequality (i.e. they are a metric). We try to minimize distances. Scores are more flexible, but distances have better mathematical properties. The local alignment method we will use requires scores.