Lecture_12_genome_sequencing

sn output a string s that contains all strings s1

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: lies genes duplicate & then diverge •  Segmental duplicaCons ~very long, very similar copies Fragment Assembly •  ComputaConal Challenge: assemble individual short fragments (reads) into a single genomic sequence (“superstring”) •  UnCl late 1990s the shotgun fragment assembly of human genome was viewed as intractable problem Shortest Superstring Problem •  Problem: Given a set of strings, find a shortest string that contains all of them •  Input: Strings s1, s2,…., sn •  Output: A string s that contains all strings s1, s2,…., sn as substrings, such that the length of s is minimized •  Complexity: NP- complete •  Note: this formulaCon does not take into account sequencing errors Algorithms for Fragment Assembly Whole Genome Shotgun Sequencing Genome Genome amplified and sliced into smaller fragments (>=600bp) Build consensus sequence from overlap Overlap- Layout- Consensus Assemblers: ARACHNE, PHRAP, CAP, TIGR, CELERA Overlap: find potenCally overlapping reads Layout: merge reads into conCgs and conCgs into superconCgs Consensus: derive the DNA sequence and correct read errors ..ACGATTACAATAGGTT.. Overlap •  Eread is compared to that of every other read, in both the forward and reverse complement orientaCons. •  As such, the overlap computaCon step is a very Cme intensive step – especially if the set of reads is very large. •  For example, the whole genome shotgun assembly of Drosophila had about 3 x 10^6 reads of 500 bases, requiring roughly 10^13 comparisons (Deonier et al., 2010). •  Even on today's...
View Full Document

This note was uploaded on 02/10/2014 for the course CS 425 taught by Professor Asaben-hur during the Fall '13 term at Colorado State.

Ask a homework question - tutors are online