1
15853
Page 1
15853:Algorithms in the Real World
Computational Biology II
– Sequence Alignment
– Database searches
15853
Page 2
Extending LCS for Biology
The LCS/Edit distance problem is not a “practical”
model for comparing DNA or proteins.
Why?
Good example of the simple model failing.
15853
Page 3
Extending LCS for Biology
The LCS/Edit distance problem is not a “practical”
model for comparing DNA or proteins.
–
Some aminoacids are “closer” to each others
than others (e.g. more likely to mutate among
each other, or closer in structural form).
–
Some aminoacids have more “information” than
others and should contribute more.
–
The cost of a deletion (insertion) of length n
should not be counted as n times the cost of a
deletion (insertion) of length 1.
–
Biologist often care about finding “local”
alignments instead of a global alignment.
15853
Page 4
What we will talk about today
Extensions
•
Sequence Alignment
: a generalization of LCS to
account for the closeness of different elements
•
Gap Models
:
More sophisticated models for
accounting for the cost of adjacent insertions or
deletions
•
Local Alignment
: Finding parts of one sequence in
parts of another sequence.
Applications
•
FASTA
and
BLAST
: The most common sequence
matching tools used in Molecular Biology.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
2
15853
Page 5
Sequence Alignment
A generalization of LCS / Edit Distance
Extension
: A’ is an extension of A if it is A with
spaces _ added.
Alignment
: An alignment of A and B is a pair of
extensions A’ and B’ such that A’ = B’
Example
:
A
= a b a c d a
B
= a a d c d d c
A’ = _ a b a c d a _
B’ = a a d _ c d d c
15853
Page 6
The Score (Weight)
Σ
+
=
alphabet including a “space” character
Scoring Function
:
σ
(x,y), x,y
∈ Σ
+
Alignment score
:
Optimal alignment
: An alignment (A’, B’) of (A, B)
such that W(A’,B’) is
maximized
.
We will denote
this optimized score as W(A,B).
Same as LCS when:
(
)
∑
=
=

'

..
1
'
,
'
)
'
,
'
(
A
i
i
i
B
A
B
A
W
σ
⎩
⎨
⎧
≠
=
=
otherwise
0
_
if
1
)
,
(
y
x
y
x
σ
15853
Page 7
Example
A =
a b a c d a c
B =
c a d c d d c
Alignment 1
_ a b a c d a c

 

c a d _ c d d c
Alignment 2
a b a _ c d a c

 

_ c a d c d d c
1
1
1
1
1
_
1
2
0
0
0
d
1
0
2
1
0
c
1
0
1
2
0
b
1
0
0
0
2
a
_
d
c
b
a
σ
(x,y)
Which is the better
alignment?
6
7
15853
Page 8
Scores vs. Distances
Maximizing vs. Minimizing.
Scores
:
–
Can be positive, zero, or negative.
We try to
maximize scores.
Distances
:
–
Must be nonnegative, and typically we assume
they obey the triangle inequality (i.e. they are a
metric).
We try to minimize distances.
Scores are more flexible, but distances have better
mathematical properties.
The local alignment
method we will use requires scores.