This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 15853 Page 1 15853:Algorithms in the Real World Computational Biology II – Sequence Alignment – Database searches 15853 Page 2 Extending LCS for Biology The LCS/Edit distance problem is not a “practical” model for comparing DNA or proteins. Why? Good example of the simple model failing. 15853 Page 3 Extending LCS for Biology The LCS/Edit distance problem is not a “practical” model for comparing DNA or proteins. – Some aminoacids are “closer” to each others than others (e.g. more likely to mutate among each other, or closer in structural form). – Some aminoacids have more “information” than others and should contribute more. – The cost of a deletion (insertion) of length n should not be counted as n times the cost of a deletion (insertion) of length 1. – Biologist often care about finding “local” alignments instead of a global alignment. 15853 Page 4 What we will talk about today Extensions • Sequence Alignment : a generalization of LCS to account for the closeness of different elements • Gap Models : More sophisticated models for accounting for the cost of adjacent insertions or deletions • Local Alignment : Finding parts of one sequence in parts of another sequence. Applications • FASTA and BLAST : The most common sequence matching tools used in Molecular Biology. 2 15853 Page 5 Sequence Alignment A generalization of LCS / Edit Distance Extension : A’ is an extension of A if it is A with spaces _ added. Alignment : An alignment of A and B is a pair of extensions A’ and B’ such that A’ = B’ Example : A = a b a c d a B = a a d c d d c A’ = _ a b a c d a _ B’ = a a d _ c d d c 15853 Page 6 The Score (Weight) Σ + = alphabet including a “space” character Scoring Function : σ (x,y), x,y ∈ Σ + Alignment score : Optimal alignment : An alignment (A’, B’) of (A, B) such that W(A’,B’) is maximized . We will denote this optimized score as W(A,B). Same as LCS when: ( ) ∑ = =  '  .. 1 ' , ' ) ' , ' ( A i i i B A B A W σ ⎩ ⎨ ⎧ ≠ = = otherwise _ if 1 ) , ( y x y x σ 15853 Page 7 Example A = a b a c d a c B = c a d c d d c Alignment 1 _ a b a c d a c     c a d _ c d d c Alignment 2 a b a _ c d a c     _ c a d c d d c11111 _1 2 d1 2 1 c1 1 2 b1 2 a _ d c b a σ (x,y) Which is the better alignment? 6 7 15853 Page 8 Scores vs. Distances Maximizing vs. Minimizing. Scores : – Can be positive, zero, or negative. We try to maximize scores. Distances : – Must be nonnegative, and typically we assume they obey the triangle inequality (i.e. they are a metric). We try to minimize distances. Scores are more flexible, but distances have better mathematical properties. The local alignment method we will use requires scores. 3 15853 Page 9 σ (x,y) for Protein Matching How is the function/matrix derived ?...
View
Full
Document
This note was uploaded on 11/09/2008 for the course COMPUTER S 15853 taught by Professor Guyblelloch during the Fall '07 term at Carnegie Mellon.
 Fall '07
 GuyBlelloch
 Algorithms

Click to edit the document details