Bio2 - 1 15-853 Page 1 15-853:Algorithms in the Real World Computational Biology II – Sequence Alignment – Database searches 15-853 Page 2

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 15-853 Page 1 15-853:Algorithms in the Real World Computational Biology II – Sequence Alignment – Database searches 15-853 Page 2 Extending LCS for Biology The LCS/Edit distance problem is not a “practical” model for comparing DNA or proteins. Why? Good example of the simple model failing. 15-853 Page 3 Extending LCS for Biology The LCS/Edit distance problem is not a “practical” model for comparing DNA or proteins. – Some amino-acids are “closer” to each others than others (e.g. more likely to mutate among each other, or closer in structural form). – Some amino-acids have more “information” than others and should contribute more. – The cost of a deletion (insertion) of length n should not be counted as n times the cost of a deletion (insertion) of length 1. – Biologist often care about finding “local” alignments instead of a global alignment. 15-853 Page 4 What we will talk about today Extensions • Sequence Alignment : a generalization of LCS to account for the closeness of different elements • Gap Models : More sophisticated models for accounting for the cost of adjacent insertions or deletions • Local Alignment : Finding parts of one sequence in parts of another sequence. Applications • FASTA and BLAST : The most common sequence matching tools used in Molecular Biology. 2 15-853 Page 5 Sequence Alignment A generalization of LCS / Edit Distance Extension : A’ is an extension of A if it is A with spaces _ added. Alignment : An alignment of A and B is a pair of extensions A’ and B’ such that |A’| = |B’| Example : A = a b a c d a B = a a d c d d c A’ = _ a b a c d a _ B’ = a a d _ c d d c 15-853 Page 6 The Score (Weight) Σ + = alphabet including a “space” character Scoring Function : σ (x,y), x,y ∈ Σ + Alignment score : Optimal alignment : An alignment (A’, B’) of (A, B) such that W(A’,B’) is maximized . We will denote this optimized score as W(A,B). Same as |LCS| when: ( ) ∑ = = | ' | .. 1 ' , ' ) ' , ' ( A i i i B A B A W σ ⎩ ⎨ ⎧ ≠ = = otherwise _ if 1 ) , ( y x y x σ 15-853 Page 7 Example A = a b a c d a c B = c a d c d d c Alignment 1 _ a b a c d a c | | | | c a d _ c d d c Alignment 2 a b a _ c d a c | | | | _ c a d c d d c-1-1-1-1-1 _-1 2 d-1 2 1 c-1 1 2 b-1 2 a _ d c b a σ (x,y) Which is the better alignment? 6 7 15-853 Page 8 Scores vs. Distances Maximizing vs. Minimizing. Scores : – Can be positive, zero, or negative. We try to maximize scores. Distances : – Must be non-negative, and typically we assume they obey the triangle inequality (i.e. they are a metric). We try to minimize distances. Scores are more flexible, but distances have better mathematical properties. The local alignment method we will use requires scores. 3 15-853 Page 9 σ (x,y) for Protein Matching How is the function/matrix derived ?...
View Full Document

This note was uploaded on 11/09/2008 for the course COMPUTER S 15853 taught by Professor Guyblelloch during the Fall '07 term at Carnegie Mellon.

Page1 / 10

Bio2 - 1 15-853 Page 1 15-853:Algorithms in the Real World Computational Biology II – Sequence Alignment – Database searches 15-853 Page 2

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online