791_cb_lecture3

791_cb_lecture3 - 7.91 / 7.36 / BE.490 Lecture #3 Mar. 2,...

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
7.91 / 7.36 / BE.490 Lecture #3 Mar. 2, 2004 DNA Motif Modeling & Discovery Chris Burge
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Review of DNA Seq. Comparison/Alignment Target frequencies and mismatch penalties Eukaryotic gene structure Comparative genomics applications: (2 species comparison) Intro to DNA sequence motifs - Pipmaker - Phylogenetic Shadowing (many species)
Background image of page 2
Organization of Topics Model Dependence Lecture Object Structure Weight Matrix Model Hidden Markov Model Independence 3/2 Local 3/4 Dependence Energy Model, Covariation Model Non-local Dependence 3/9
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
DNA Motif Modeling & Discovery • Information Content of a Motif See Ch. 4 of Mount • Review - WMMs for splice sites • The Motif Finding/Discovery Problem • The Gibbs Sampler • Motif Modeling - Beyond Weight Matrices
Background image of page 4
Splicing Model I branch site 5 splice site 3 splice site
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Weight Matrix Models II 5 splice signal C A G G T Background Con: Pos -3 -2 -1 +5 +6 A 0.3 0.6 0.1 0.1 0.1 C 0.4 0.1 0.0 0.1 0.2 G 0.2 0.2 0.8 0.8 0.2 T 0.1 0.1 0.1 0.0 0.5 Pos Generic A 0.25 C 0.25 G 0.25 T 0.25 S = S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 ( S 1 )P -2 ( S 2 )P -1 ( S 3 ) ••• P 5 ( S 8 )P 6 ( S 9 ) Odds Ratio: R = P(S|+) = P -3 P(S|-) = P bg ( S 1 )P bg ( S 2 )P bg ( S 3 ) ••• P bg ( S 8 )P bg ( S 9 ) Background model homogenous, assumes independence
Background image of page 6
Weight Matrix Models III S = S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 P(S|+) P -3 ( S 1 )P -2 ( S 2 )P -1 ( S 3 ) ••• P 5 ( S 8 )P 6 ( S 9 ) Odds Ratio: R = = P(S|-) P bg ( S 1 )P bg ( S 2 )P bg ( S 3 ) ••• P bg ( S 8 )P bg ( S 9 ) k =9 = P -4+ k ( S k )/ P bg ( S k ) k =1 k =9 Score s = log 2 R = log 2 (P -4+ k ( S k )/ P bg ( S k )) k =1 Neyman-Pearson Lemma: Optimal decision rules are of the form R > C Equiv.: log 2 (R) > C because log is a monotone function
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Weight Matrix Models IV Slide WMM along sequence: ttgacctagatgagatgtcgttcacttttactgagctacagaaaa …… Assign score to each 9 base window. Use score cutoff to predict potential 5 splice sites
Background image of page 8
Histogram of 5’ss Scores True 5’ Splice Sites “Decoy” 5’ Splice Sites Score (1/10 bit units) Measuring Accuracy: Sensitivity = % of true sites w/ score > cutoff Specificity = % of sites w/ score > cutoff that are true sites Sn: 20% 50% 90% Sp: 50% 32% 7%
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
What does this result tell us? A) Splicing machinery also uses other information besides 5’ss motif to identify splice sites; OR B) WMM model does not accurately capture some aspects of the 5’ss that are used in recognition (or both) This is a pretty common situation in biology
Background image of page 10
What is a DNA (RNA) Motif ? A pattern common to a set of DNA (RNA) sequences that share a common biological property, such as being binding sites for a regulatory protein Common motif adjectives: exact/precise versus degenerate strong versus weak ( good versus lousy ) high information content versus low information content
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Information Theory So we end up with Shannon’s famous formula: 20 H = - P i(log 2 P i) Where H = the “Shannon Entropy” In bits per position in the alignment i=1 What does this mean???
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 11/11/2011 for the course BIO 20.410j taught by Professor Rogerd.kamm during the Spring '03 term at MIT.

Page1 / 40

791_cb_lecture3 - 7.91 / 7.36 / BE.490 Lecture #3 Mar. 2,...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online