Bio3 - Multiple Alignment 15-853:Algorithms in the Real World Computational Biology III Multiple Sequence Alignment Sequencing the Genome A A A C C

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
1 15-853 Page 1 15-853:Algorithms in the Real World Computational Biology III – Multiple Sequence Alignment –Sequencing the Genome 15-853 Page 2 Multiple Alignment A C T _ G T A A C A C G T T A G T G _ T A C C _ G C T A Goal: match the “maximum” number of aligned pairs of symbols. Applications: – Assembling multiple noisy reads of fragments of sequences – Finding a canonical among members of a family and studying how the members differ The problem is NP-hard 15-853 Page 3 Example Output Output from typical multiple alignment software DNAMAN (using ClustalW ) 15-853 Page 4 Scoring Multiple Alignments 1. Distance from consensus S c : 2. Pairwise distances: 3. Evolutionary Tree Alignment = S S c i i S S D D ) , ( ∈∈ = S SS S S j i ii j S S D D / ) , ( S 1 S 2 S 3 S 4 S 5 ) , ( ) , ( ) , ( ) , ( 45 123 3 12 5 4 2 1 S S D S S D S S D S S D D + + + =
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 15-853 Page 5 Approaches Dynamic programming : optimal, but takes time that is exponential in p Center Star Method : approximation Clustering Methods : also called iterative pairwise alignment. Typically an approximation. Many variants, many software packages 15-853 Page 6 Using Dynamic Programming For sequences of length n we can fill in a - dimensional array in n time and space. For example for = 3: where assuming the pairwise distance metric. Takes time exponential in p. Perhaps OK for p = 3 + + + = ... _) _, ( _) , , ( ) , , ( min , , 1 , 1 , 1 1 , 1 , 1 i k j i j i k j i k j i k j i ijk a d D b a d D c b a d D D ) , ( ) , ( ) , ( ) , , ( c a d c b d b a d c b a d + + = 7 cases 15-853 Page 7 Example 15-853 Page 8 Optimization As in the case of pairwise alignment we can view the array as a graph and find shortest paths. Used in a program called MSA. Can align 6 strings consisting of 200 bp each in a “practical” amount of time.
Background image of page 2
3 15-853 Page 9 Using Clustering 1. Compute D(S i ,S j ) for all pairs 2. Bottom up cluster I. All sequences start as their own cluster II. Repeat a) find the two “closest” clusters and join them into one b) Find best alignment of the two clusters being joined S 1 S 2 S 3 S 4 S 5 15-853 Page 10 Distances between Clusters Could use difference between consensus. A popular technique is called the “Unweighted Pair- Group Method using arithmetic Averages” (UPGMA). It takes the average of all distances among the two clusters. Implemented in Clustal and Pileup actg_a attg_a actgga _accca aaccga D? 15-853 Page 11 Summary of Matching Types of matching: Global : align two sequences A and B Local : align A with any part of B Multiple : align k sequences (NP-complete) Cost models LCS and MED Scoring matrices: Blosum, PAM Gap cost: affine, general Methods Dynamic programming: many optimizations “Fingerprinting” : hashing of small seqs. (approx.) Clustering: for multiple alignment (approx.) 15-853 Page 12 Sequencing the Genome One of the great achievements of the 21 st century.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 15-853 Page 13 Tools of the Trade Cutting: Arber, Nathans, and Smith, Nobel Prize in Medicine (1978) for “the discovery of restriction enzymes and their application to problems of molecular genetics".
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/26/2010 for the course COMPUTER S 15-853 taught by Professor Guyblelloch during the Fall '09 term at Carnegie Mellon.

Page1 / 16

Bio3 - Multiple Alignment 15-853:Algorithms in the Real World Computational Biology III Multiple Sequence Alignment Sequencing the Genome A A A C C

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online