Lecture 3: Multiple Sequence Alignment Eric C. Rouchka, D.Sc. eric.rouchka@uofl.edu http://kbrin.a-bldg.louisville.edu/~rouchka/CECS694/

Amino Acid Sequence Alignment No exact match/mismatch scores Match state score calculated by table lookup Lookup table is mutation matrix
PAM250 Lookup

Affine Gap Penalties Gap Open Gap Extension Maximum score matrix determined by maximum of three matrices: Insertion matrix (gap in sequence A) Deletion matrix (gap in sequence B)
Dynamic Programming with Affine Gap M i,j = MAX{ M i-1, j-1 + s(x i , y i ), I i-1, j-1 + s(x i , y i ), D i-1, j-1 + s(x i , y i ) } I i,j = MAX{ M i-1, j – g, // Opening new gap, g = gap open penalty; I i-1, j – r} // Extending existing gap, r = gap extend penalty D i,j = MAX{M i,j-1 – g, // Opening new gap; D i,j-1 – r} // Extending existing gap V i,j = MAX {M i,j , I i,j , D i,j }

Programming Project #1 Don’t worry about affine gaps – will become part of programming project 2 Make sure you can align DNA and amino acid sequence
Multiple Sequence Alignment Similar genes conserved across organisms Same or similar function

Multiple Sequence Alignment Simultaneous alignment of similar genes yields: regions subject to mutation regions of conservation mutations or rearrangements causing change in conformation or function
Multiple Sequence Alignment New sequence can be aligned with known sequences Yields insight into structure and function Multiple alignment can detect important features or motifs

Multiple Sequence Alignment GOAL: Take 3 or more sequences, align so greatest number of characters are in the same column Difficulty: introduction of multiple sequences increases combination of matches, mismatches, gaps
Example Multiple Alignment Example alignment of 8 IG sequences.

Approaches to Multiple Alignment Dynamic Programming Progressive Alignment Iterative Alignment Statistical Modeling
Dynamic Programming Approach Dynamic programming with two sequences Relatively easy to code Guaranteed to obtain optimal alignment Can this be extended to multiple sequences?

Dynamic Programming With 3 Sequences Consider the amino acid sequences VSNS, SNA, AS Put one sequence per axis (x, y, z) Three dimensional structure results
