Ch4Pei

Ch4Pei - Fragment assembly of DNA A typical approach to...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
Fragment assembly of DNA A typical approach to sequencing long DNA molecules is to sample and then sequence fragments from them.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
® Pei-Jie Wu 2 Fragment assembly of DNA Biological background Models Algorithms Heuristics
Background image of page 2
® Pei-Jie Wu 3 Biological background Problem as puzzle We do not know which letter from the set {A, C, G, T} is written on each card, but we do know that cards in the same position of opposite stands from a complementary pair. Our goal is obtain the letters using certain hint, which are (approximate) substrings of the rows.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
® Pei-Jie Wu 4 Biological background Target: The long sequence to reconstruct. Fragment vs. Subsequence Shotgun method: Based on fragment overlap Fragment assembly: A collection of fragments to put together
Background image of page 4
® Pei-Jie Wu 5 Biological background --The ideal case Case: p.106 Aligned the input set, ignoring spaces at the extremities Overlaps: the end part of a fragment is similar to the beginning of another Consensus sequence base on majority vote
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
® Pei-Jie Wu 6 Biological background --Complications The main factors that add to the complexity of the problem are: Error Unknown orientation Repeated regions Lack of coverage.
Background image of page 6
® Pei-Jie Wu 7 Biological background --Complications It usually means algorithms that require more time and space when computer program deal with error. The simplest errors are called base call errors and comprise base substitutions, insertions and deletions in the fragments. Base call errors occurs in practice at rates varying from 1 to 5 errors every 100 characters. Figures 4.2, 4.3, 4.4 Errors
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
® Pei-Jie Wu 8 Biological background --Complications Two other types of errors: chimera and Contamination Chimeras, arise when two regular fragments from distinct parts of the target molecule join end-to-end to form a fragment that is not a contiguous part of the target Figure 4.5 Solution: Must be recognized as such and removed from the fragment set in a preprocessing stage. Contamination is from host or vector DNA Solution: Most vectors are well know, so we can screen the data before starting assembly. Errors
Background image of page 8
9 Biological background --Complications We generally do not know to which strand a particular fragment belongs to. The input fragments as being all approximate substrings of the consensus sought either as given or in reverse complement.
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/10/2012 for the course CSE 5615 taught by Professor Mitra during the Fall '11 term at FIT.

Page1 / 33

Ch4Pei - Fragment assembly of DNA A typical approach to...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online