Lecture_12_genome_sequencing

Thus the genome assembly becomes equivalent to nding a

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: computers, running that many comparisons is impracCcal, so seeded algorithm are used Overlapping Reads •  •  •  Sort all k- mers in reads Find pairs of reads sharing a k- mer Extend to full alignment – throw away if not >95% similar TACA TAGATTACACAGATTAC T GA || ||||||||||||||||| | || TAGT TAGATTACACAGATTAC TAGA Finding Overlapping Reads Create local mulCple alignments from the overlapping reads. TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TTACACAGATTATTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TTACACAGATTATTGA TAGATTACACAGATTACTGA Finding Overlapping Reads Correct errors using mulCple alignment. •  Find locaCons where there is a deviaCon in which 1% of the data diverge from the rest. •  Make those posiCons agree with the rest. TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TTACACAGATTATTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA Build the Overlap Graph •  Overlap graph: the nodes represent actual reads, and edges represent overlaps between these reads. •  Thus, the genome assembly becomes equivalent to finding a path through the graph that visits each node exactly once (i.e., a Hamiltonian path). 24 An overlap graph. Nodes are complete reads and edges connect reads that overlap. Note that in an actual graph, reads and overlaps would be much larger. 25 Layout •  Finding a Hamiltonian path through the overlap graph is not a trivial task. •  In order to decrease the size of the graph, the OLC assembly graph is simplified in the layout stage, where segments of the graph are compressed into conCgs •  Thus, we have to find a manner to decrease the complexity of the graph Graph ReducCon •  A conCg would be a subgraph, or a group of nodes, with many connecCons among each other, as they all overlap with each other and refer to the same sequence (A and B). •  Once a subgraph has been idenCfied, these nodes and edges are compressed into one node, or a conCg, thereby simplifying the graph (C) 27 28 SeparaCng ConCgs •  Th...
View Full Document

This note was uploaded on 02/10/2014 for the course CS 425 taught by Professor Asaben-hur during the Fall '13 term at Colorado State.

Ask a homework question - tutors are online