Lect11 Protein sequencing and Mass Spectrometry

# An Introduction to Bioinformatics Algorithms (Computational Molecular Biology)

This preview shows pages 1–13. Sign up to view the full content.

Fa 06 CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Fa 06 CSE182 Whole genome shotgun Input: Shotgun sequence fragments (reads) Mate pairs Output: A single sequence created by consensus of overlapping reads First generation of assemblers did not include mate-pairs (Phrap, CAP. .) Second generation: CA, Arachne, Euler We will discuss Arachne, a freely available sequence assembler (2nd generation)
Fa 06 CSE182 Arachne (also celera assembler) Overlap Problem 1: Large all against all computation Fast overlap computation using k-mer hashing. Layout Problem 2: Small contigs with 10X coverage Solution 2: Use mate-pairs to build super-contigs Problem 3: Repetitive structure of the genome.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Fa 06 CSE182 Problem 3: Repeats
Fa 06 CSE182 Repeats & Chimerisms 40-50% of the human genome is made up of repetitive elements. Repeats can cause great problems in the assembly! Chimerism causes a clone to be from two different parts of the genome. Can again give a completely wrong assembly

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Fa 06 CSE182 Repeats How can you detect if your fragment overlap is due to a repeat?
Fa 06 CSE182 Repeat detection Lander Waterman strikes again! The expected number of clones in a Repeat containing island is MUCH larger than in a non-repeat containing island (contig). Thus, every contig can be marked as Unique, or non-unique. In the first step, throw away the non-unique islands. Repeat

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Fa 06 CSE182 Detecting Repeat Contigs 1: Read Density Compute the log-odds ratio of two hypotheses: H1: The contig is from a unique region of the genome. The contig is from a region that is repeated at least twice
Fa 06 CSE182 Detecting Chimeric reads Chimeric reads: Reads that contain sequence from two genomic locations. Good overlaps: G(a,b) if a,b overlap with a high score Transitive overlap: T(a,c) if G(a,b), and G(b,c) Find a point x across which only transitive overlaps occur. X is a point of chimerism

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Fa 06 CSE182 Contig assembly Reads are merged into contigs upto repeat boundaries. (a,b) & (a,c) overlap, (b,c) should overlap as well. Also, shift(a,c)=shift(a,b)+shift(b,c) Most of the contigs are unique pieces of the genome, and end at some Repeat boundary. Some contigs might be entirely within repeats. These must be detected
Fa 06 CSE182 Creating Super Contigs

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Fa 06 CSE182 Supercontig assembly Supercontigs are built incrementally Initially, each contig is a supercontig. In each round, a pair of super-contigs is merged until no more can be performed.
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 02/14/2008 for the course CSE 182 taught by Professor Bafna during the Fall '06 term at UCSD.

### Page1 / 47

Lect11 Protein sequencing and Mass Spectrometry -...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online