423f11-lec7-assembly

423f11-lec7-assembly - Genome Assembly Paradigms CMSC 423...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
Genome Assembly Paradigms CMSC 423 Carl Kingsford
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Shortest Common Superstring Def. Given strings s 1 , . .., s n , find the shortest string T such that each s i is a sub string of T . NP-hard (contrast with case when requiring s i to be sub sequences of T ) Approximation algorithms exist with factors: 4, 3, 2.89, 2.75, 2.67, 2.596, 2.5, . .. Basic greedy method: Fnd pair of strings that overlap the best, merge them, repeat (4 approximation): Given match, mismatch, gap costs, how can we compute the score of the best overlap?
Background image of page 2
Overlap Alignment 0 1 2 3 4 5 6 7 8 9 10 11 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 1g 2g 3g 4g 5g 6g 7g 8g 9g 10g 11g 12g x y C A G T T G C A A A A G G T A T G A A T C Score of an optimal alignment between a sufFx of Y and a preFx of X Initialize frst column to 0s Answer is maximum score in top row (traceback starts From there until it Falls oFF leFt side) y x
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Overlap Alignment 0 1 2 3 4 5 6 7 8 9 10 11 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 1g 2g 3g 4g 5g 6g 7g 8g 9g 10g 11g 12g x y C A G T T G C A A A A G G T A T G A A T C Score of an optimal alignment between a sufFx of Y and a preFx of X Initialize frst column to 0s Answer is maximum score in top row (traceback starts From there until it Falls oFF leFt side) y x
Background image of page 4
K-mer Hashing AAAA AAAT AAAG AAAC AATA AATT AATG AATC AAGA AAGT r1 r2 r10 r11 r2 r3 read kmer Only compute overlap alignment between reads that share a kmer:
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The problem with Shortest Common Superstring (SCS): Repeats AAAAAAAAAAAAAAAAAAA AAAAA AAAAA AAAAA AAAAA AAAAA AAAAA AAAAA AAAAA AAAAA AAAAA AAAAA Truth: SCS: ACCGCCT ACCGCCT ACCGCCT More complex example: 2 or 3 copies?
Background image of page 6
Overlap Graph 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Overlap graph: Nodes = reads Edges = overlaps Given overlap graph, how can we fnd a good candidate assembly?
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Overlap Graph 1 2 3 4 5 6 7 Overlap graph: Nodes = reads Edges = overlaps 1 2 3 4 5 6 7 1 2 3 4 5 6 7 7 Given overlap graph, how can we fnd a good candidate assembly?
Background image of page 8
Overlap Graph 1 2 3 4 5 6 7 Overlap graph: Nodes = reads Edges = overlaps 1 2 3 4 5 6 7 1 2 3 4 5 6 7 7 Given overlap graph, how can we fnd a good candidate assembly?
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.

Page1 / 26

423f11-lec7-assembly - Genome Assembly Paradigms CMSC 423...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online