791_cb_lecture2

791_cb_lecture2 - 7.91 / 7.36 / BE.490 Lecture #2 Feb. 26,...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
7.91 / 7.36 / BE.490 Lecture #2 Feb. 26, 2004 DNA Sequence Comparison Alignment Chris Burge
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Review of Lecture 1: “Genome Sequencing & DNA Sequence Analysis” The Language of Genomics Flavors of BLAST Statistics of High Scoring Segments - cDNAs, ESTs, BACs, Alus, etc. Dideoxy Method / Shotgun Sequencing - The ‘shotgun coverage equation’ (Poisson) - BLAST[PNX], TBLAST[NX]
Background image of page 2
Shotgun Sequencing a BAC or a Genome 200 kb (NIH) 3 Gb (Celera) Sequence, Assemble Sonicate, Subclone Subclones Shotgun Contigs What would cause problems with assembly?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
DNA Sequence Alignment IV Which alignments are significant? Q: 1 ttgacctagatgagatgtcgttcacttttactgagctacagaaaa 45± |||| |||||||||||| | |||||||||||||||||||||||||± S: 403 ttgatctagatgagatgccattcacttttactgagctacagaaaa 447± Identify high scoring segments whose score S exceeds a cutoff x using dynamic programming. Scores follow an extreme value distribution: P(S > x) = 1 - exp[-Kmn e - λ x ] For sequences of length m, n where K, λ depend on the score matrix and the composition of the sequences being compared (Same theory as for protein sequence alignments)
Background image of page 4
From M. Yaffe Notes (cont) Lecture #2 The random sequence alignment scores would give rise to an “extreme value” distribution – like a skewed gaussian. Called Gumbel extreme value distribution For a normal distribution with a mean m and a variance σ , the height of the curve is described by Y=1/( σ 2 π ) exp[-(x-m) 2 /2 σ 2 ] For an extreme value distribution, the height of the curve is described by Y=exp[-x-e -x ] …and P(S>x) = 1-exp[-e - λ (x-u)] where u=(ln K mn)/ λ Can show that mean extreme score is ~ log 2 (nm), and the probability of getting a score that exceeds some number of “standard deviations” x is: P(S>x)~ K mne - λ x. *** K and λ are tabulated for different matrices **** - λ S For the less statistically inclined: E~ K mne -2 -1 0.2 Yev 0.4 -4 4 0.4 B. Yn Probability values for the extreme value distribution (A) and the normal distribution (B). The area under each curve is 1. 0 1 2 X X A. 3 4 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
DNA Sequence Comparison & Alignment Target frequencies and mismatch penalties Eukaryotic gene structure Comparative genomics applications: See Ch. 7 of Mount - Pipmaker (2 species comparison) - Phylogenetic Shadowing (many species) Intro to DNA sequenc e motifs
Background image of page 6
i DNA Sequence Alignment V How is λ related to the score matrix? λ
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 36

791_cb_lecture2 - 7.91 / 7.36 / BE.490 Lecture #2 Feb. 26,...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online