assembly - Sequence Assembly BMI/CS 576...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
Sequence Assembly BMI/CS 576 www.biostat.wisc.edu/bmi576/ Mark Craven craven@biostat.wisc.edu Fall 2011 The sequencing problem We want to determine the identity of the base pairs that make up: – a single large molecule of DNA – the genome of a single cell/individual organism – the genome of a species But we can’t (currently) “read” off the sequence of an entire molecule all at once
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The strategy: substrings We do have the ability to read or detect short pieces (substrings) of DNA – Sanger sequencing: 500-800 bp/read – Latest technologies: • 454 Genome Sequencer FLX: 250-600 bp/read • Illumina Genome Analyzer: 35-150 bp/read Shotgun sequencing fragment assembly Multiple copies of sample DNA Randomly fragment DNA Sequence sample of fragments Assemble reads
Background image of page 2
5 The fragment assembly problem Given: A set of reads (strings) { s 1 , s 2 , … , s n } Do: Determine a large string s that “best explains” the reads What do we mean by “ best explains ”? What assumptions might we require?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Shortest superstring problem Objective: Find a string s such that – all reads s 1 , s 2 , … , s n are substrings of s s is as short as possible Assumptions: – Reads are 100% accurate – Identical reads must come from the same location on the genome – “best” = “simplest” Shortest superstring example Given the reads: { ACG , CGA , CGC , CGT , GAC , GCG , GTA , TCG } What is the shortest superstring you can come up with? TCGACGCGTA (length 10)
Background image of page 4
Algorithms for shortest superstring problem This problem turns out to be NP -complete Simple greedy strategy: while # strings > 1 do merge two strings with maximum overlap loop Conjectured to give string with length ! 2 " minimum length Other approaches are based on graph theory… Graph basics a graph ( G ) consists of vertices ( V ) and edges ( E ) G = (V,E) edges can either be directed ( directed graphs) or undirected ( undirected graphs) 1 2 4 3 1 2 4 3
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/15/2011 for the course BMI 576 taught by Professor Staff during the Fall '11 term at Wisc Green Bay.

Page1 / 19

assembly - Sequence Assembly BMI/CS 576...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online