115-853Page 115-853:Algorithms in the Real WorldComputational Biology II– Sequence Alignment– Database searches15-853Page 2Extending LCS for BiologyThe LCS/Edit distance problem is not a “practical”model for comparing DNA or proteins.Why?Good example of the simple model failing.15-853Page 3Extending LCS for BiologyThe LCS/Edit distance problem is not a “practical”model for comparing DNA or proteins.–Some amino-acids are “closer” to each others than others (e.g. more likely to mutate among each other, or closer in structural form).–Some amino-acids have more “information” than others and should contribute more.–The cost of a deletion (insertion) of length n should not be counted as n times the cost of a deletion (insertion) of length 1.–Biologist often care about finding “local”alignments instead of a global alignment.15-853Page 4What we will talk about todayExtensions•Sequence Alignment: a generalization of LCS to account for the closeness of different elements•Gap Models:More sophisticated models for accounting for the cost of adjacent insertions or deletions•Local Alignment: Finding parts of one sequence in parts of another sequence.Applications•FASTAand BLAST: The most common sequence matching tools used in Molecular Biology.
This preview has intentionally blurred sections.
Sign up to view the full version.
215-853Page 5Sequence AlignmentA generalization of LCS / Edit DistanceExtension: A’ is an extension of A if it is A with spaces _ added.Alignment: An alignment of A and B is a pair of extensions A’ and B’ such that |A’| = |B’|Example:A = a b a c d a B = a a d c d d cA’ = _ a b a c d a _B’ = a a d _ c d d c15-853Page 6The Score (Weight)Σ+=alphabet including a “space” characterScoring Function: σ(x,y), x,y ∈ Σ+Alignment score: Optimal alignment: An alignment (A’, B’) of (A, B) such that W(A’,B’) is maximized. We will denote this optimized score as W(A,B).Same as |LCS| when:()∑==|'|..1',')','(AiiiBABAWσ⎩⎨⎧≠==otherwise0_if1),(yxyxσ15-853Page 7ExampleA =a b a c d a cB =c a d c d d cAlignment 1_ a b a c d a c| | | |c a d _ c d d cAlignment 2a b a _ c d a c| | | |_ c a d c d d c-1-1-1-1-1_-12000d-10210c-10120b-10002a_dcbaσ(x,y)Which is the betteralignment?6715-853Page 8Scores vs. DistancesMaximizing vs. Minimizing.Scores:–Can be positive, zero, or negative. We try to maximize scores.Distances:–Must be non-negative, and typically we assume they obey the triangle inequality (i.e. they are a metric). We try to minimize distances.Scores are more flexible, but distances have better mathematical properties. The local alignment method we will use requires scores.