Page 1 Pairwise Sequence Alignment using Dynamic Programmin g What is sequence alignment? Given two sequences of letters, and a scoring scheme for evaluating matching letters, find the optimal pairing of letters from one sequence to letters of the other sequence. Align: THIS IS A RATHER LONGER SENTENCE THAN THE NEXT. THIS IS A SHORT SENTENCE. THIS IS A RATHER LONGER SENTENCE THAN THE NEXT. THIS IS A ######SHORT## SENTENCE##############. OR THIS IS A SHORT#########SENTENCE##############.

Page 2 Aligning biological sequences DNA (4 letter alphabet + gap) TTGACAC TTTACAC Proteins (20 letter alphabet + gap) RKVA--GMAKPNM RKIAVAAASKPAV Statement of Problem Given 2 sequences scoring system for evaluating match(or mismatch) of two characters penalty function for gaps in sequences Produce Optimal pairing of sequences that retains the order of characters in each sequence, perhaps introducing gaps, such that the total score is optimal.
Page 3 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function. If they align, they are similar, maybe due to common descent. If they are similar, then they might have same structure or function. If one of them has known structure/function, then alignment to the other yields insight about how the structure or function works. Multiple alignment Pairwise alignment (two at a time) is much easier than multiple alignment (N at a time). This is a rather longer sentence than the next. This is a short sentence. This is the next sentence. Rather long is the next concept. Rather longer than what is the next concept.

Page 4 Drawing alignments Exact Matches OK, Inexact Costly, Gaps cheap. This is a rather longer sentence than the next. This is a ############# sentence##############. Exact Matches OK, Inexact Costly, Gaps cheap. This is a *rather longer*sentence than the next. This is a s###h####o###rtsentence##############. Exact Matches OK, Inexact Moderate, Gaps cheap. This is a rather longer sentence than ########the next#########. This is a ##short###### sentence###############################. Exact Matches cheap, Inexact cheap, Gaps expensive. This is a rather longer sentence than the next. This is a short sentence.###################### Multiple Alignment (NP-hard) This is a rather longer sentence than ########the next#########. This is a short######## sentence####################3##########. This is ######################################the next sentence. ##########Rather long is ########the############# next concept#. ##########Rather longer #########than what is the next concept#.
Page 5 There used to be dot matrices. Put one sequence along the top row of a matrix. Put the other sequence along the left column of the matrix. • Plot a dot everytime there is a match between an element of row sequence and an element of the column sequence. • Diagonal lines indicate areas of match.

Page 6 Problems with dot matrices Rely on visual analysis Difficult to find optimal alignments Need scoring schemes more sophisticated that “identical match” Difficult to estimate significance of alignments Gaps The thing that makes alignment hard is the possibility that gaps are introduced in one sequence (corresponding to a shortening of the protein chain, for example).
