Genome Alignment

Alignment Take a set of sequences. Find where they match. Arrange sequences in a matrix where columns contain homologous (corresponding?) characters from each sequence
Types of Alignments Global – include the entire length of all sequences in the alignment Local – identify and align subsets of longer sequences

Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment given a particular scoring function Too computationally intensive for genome alignment, especially multiple genomes
Dynamic Programming One possible simple scoring scheme: S i,j = 1 if the residue at position i of sequence #1 is the same as the residue at position j of sequence #2 (match score); otherwise S i,j = 0 (mismatch score) w = 0 (gap penalty)

Dynamic Programming Three steps: 1) Initialize M i,j = MAXIMUM[ M i-1, j-1 + S i,j (match/mismatch in the diagonal), M i,j-1 + w (gap in sequence #1), M i-1,j + w (gap in sequence #2) ] 2) Fill Matrix
Dynamic Programming 3) Traceback G A A T T C A G T T A G G A - T C - G - - A Score = 1+0+1+0+1+1+0+1+0+0+1 = 6

