Lect11 Protein sequencing and Mass Spectrometry

An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Fa 06 CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa 06 CSE182 Whole genome shotgun Input: Shotgun sequence fragments (reads) Mate pairs Output: A single sequence created by consensus of overlapping reads First generation of assemblers did not include mate-pairs (Phrap, CAP. .) Second generation: CA, Arachne, Euler We will discuss Arachne, a freely available sequence assembler (2nd generation)
Background image of page 2
Fa 06 CSE182 Arachne (also celera assembler) Overlap Problem 1: Large all against all computation Fast overlap computation using k-mer hashing. Layout Problem 2: Small contigs with 10X coverage Solution 2: Use mate-pairs to build super-contigs Problem 3: Repetitive structure of the genome.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa 06 CSE182 Problem 3: Repeats
Background image of page 4
Fa 06 CSE182 Repeats & Chimerisms 40-50% of the human genome is made up of repetitive elements. Repeats can cause great problems in the assembly! Chimerism causes a clone to be from two different parts of the genome. Can again give a completely wrong assembly
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa 06 CSE182 Repeats How can you detect if your fragment overlap is due to a repeat?
Background image of page 6
Fa 06 CSE182 Repeat detection Lander Waterman strikes again! The expected number of clones in a Repeat containing island is MUCH larger than in a non-repeat containing island (contig). Thus, every contig can be marked as Unique, or non-unique. In the first step, throw away the non-unique islands. Repeat
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa 06 CSE182 Detecting Repeat Contigs 1: Read Density Compute the log-odds ratio of two hypotheses: H1: The contig is from a unique region of the genome. The contig is from a region that is repeated at least twice
Background image of page 8
Fa 06 CSE182 Detecting Chimeric reads Chimeric reads: Reads that contain sequence from two genomic locations. Good overlaps: G(a,b) if a,b overlap with a high score Transitive overlap: T(a,c) if G(a,b), and G(b,c) Find a point x across which only transitive overlaps occur. X is a point of chimerism
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa 06 CSE182 Contig assembly Reads are merged into contigs upto repeat boundaries. (a,b) & (a,c) overlap, (b,c) should overlap as well. Also, shift(a,c)=shift(a,b)+shift(b,c) Most of the contigs are unique pieces of the genome, and end at some Repeat boundary. Some contigs might be entirely within repeats. These must be detected
Background image of page 10
Fa 06 CSE182 Creating Super Contigs
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fa 06 CSE182 Supercontig assembly Supercontigs are built incrementally Initially, each contig is a supercontig. In each round, a pair of super-contigs is merged until no more can be performed.
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 47

Lect11 Protein sequencing and Mass Spectrometry -...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online