MIT6_047f08_lec07_slide07

MIT6_047f08_lec07_sl - MIT OpenCourseWare http/ocw.mit.edu 6.047 6.878 Computational Biology Genomes Networks Evolution Fall 2008 For information

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms .
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Modeling Biological Sequence and Hidden Markov Models (part II - The algorithms) 6.047/6.878 - Computational Biology: Genomes, Networks, Evolution Lecture 7 Sept 25, 2008
Background image of page 2
Challenges in Computational Biology DNA 4 Genome Assembly 6 Gene Finding 9 Regulatory motif discovery Database search 3 Gene expression analysis 5 RNA transcript Sequence alignment Evolutionary Theory 10 T C ATG C TAT T CG TGATA A TGA G GATAT T T AT C ATAT T T ATGAT T T Cluster discovery 4 Gibbs sampling 9 Protein network analysis 11 Emerging network properties 13 12 Regulatory network inference Comparative Genomics 14 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Modeling biological sequences • Ability to generate DNA sequences of a certain type – Not exact alignment to previously known gene – Preserving ‘properties’ of type , not identical sequence • Ability to recognize DNA sequences of a certain type – What (hidden) state is most likely to have generated observations – Find set of states and transitions that generated a long sequence • Ability to learn distinguishing characteristics of each type – Training our generative models on large datasets – Learn to classify unlabelled data Intergenic CpG island Promoter First exon Intron Other exon Intron ACAGGATTATGGGTTACAGGTAACCGTTGTACTCACCGGGTTACAGGATTATGGGTTACAGGTAACCGGTACTCACCGGGTTACAGGATTATGGTAACGGTACTCACCGGGTTACAGGATTGTTACAG G
Background image of page 4
Markov Chains & Hidden Markov Models • Markov Chain – Q: states – p: initial state probabilities – A: transition probabilities •H M M – Q: states – V: observations – p: initial state probabilities – A: transition probabilities – E: emission probabilities A + T + G + C + a GT a AC a GC a AT A + T + G + C + A: 0 C: 0 G: 1 T: 0 A: 1 C: 0 G: 0 T: 0 A: 0 C: 1 G: 0 T: 0 A: 0 C: 0 G: 0 T: 1
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
HMM nomenclature 1 2 K 1 2 K 1 2 K 1 2 K x 2 x 3 x K 1 K x 1 •F i n d p a t h π * that maximizes total joint probability P[ x, π ] •P ( x , π ) = a 0 π 1 * Π i e π i (x i ) × a π i π i+1 start emission transition x is the (observed) sequence π is the (hidden) path e s (x i ) a st
Background image of page 6
HMM for the dishonest casino model FAIR LOADED 0.05 0.05 0.95 e F (2) = 1/6 e F (3) = 1/6 e F (4) = 1/6 e F (5) = 1/6 e F (6) = 1/6 emissions 0.95 transitions P( π i =L| π i-1 =F) a FL a LF a FF e F (1) = P(x i =1| π i =F)=1/6 a LL e L (1) = P(x i =1| π i =L) = 1/10 e L (2) = 1/10 e L (3) = 1/10 e L (4) = 1/10 e L (5) = 1/10 e L (6) = 1/2
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
HMM for CpG islands • Build a single model that combines both Markov chains: ‘+’ states : A + , C + , G + , T + • Emit symbols: A, C, G, T in CpG islands ‘-’ states : A - , C - , G - , T - • Emit symbols: A, C, G, T in non-islands • Emission probabilities distinct for the ‘+’ and the ‘-’ states – Infer most likely set of states, giving rise to observed emissions Î ‘Paint’ the sequence with + and - states A + T + G + C + A - T - G - C - A: 0 C: 0 G: 1 T: 0 A: 1 C: 0 G: 0 T: 0 A: 0 C: 1 G: 0 T: 0 A: 0 C: 0 G: 0 T: 1 A: 0 C: 0 G: 1 T: 0 A: 1 C: 0 G: 0 T: 0 A: 0 C: 1 G: 0 T: 0 A: 0 C: 0 G: 0 T: 1 Question: Why do we need so many states?
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 09/24/2010 for the course EECS 6.047 / 6. taught by Professor Manoliskellis during the Fall '08 term at MIT.

Page1 / 54

MIT6_047f08_lec07_sl - MIT OpenCourseWare http/ocw.mit.edu 6.047 6.878 Computational Biology Genomes Networks Evolution Fall 2008 For information

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online