hmm - 1 Hidden Markov Models (HMMs) (Lecture for CS397-CXZ...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Hidden Markov Models (HMMs) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 20, 2004 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign 2 Motivation: the CpG island problem Methylation in human genome CG -> TG happens in most place except start regions of genes CpG islands = 100-1,000 bases before a gene starts Questions Q1: Given a short stretch of genomic sequence, how would we decide if it comes from a CpG island or not? Q2: Given a long sequence, how would we find the CpG islands in it? 3 Answer to Q1: Bayes Classifier ( | ) ( ) ( | ) ( ) ( | ) ( | ) ( ) ( ) CpG CpG Other Other CpG Other P X H P H P X H P H P H X P H X P X P X = = Hypothesis space: H={H CpG , H Other } Evidence: X=ATCGTTC Likelihood of evidence (Generative Model) Prior probability ( | ) ( | ) ( ) ( | ) ( | ) ( ) CpG CpG CpG Other Other Other P H X P X H P H P H X P X H P H = We need two generative models for sequences: p(X| H CpG ), p(X|H Other ) 4 A Simple Model for Sequences:p(X) 1 2 1 1 1 1 1 1 ( ) ( ... ) ( | ... ) : ( ) ( ) : ( ) ( | ) n n i i i n i i n i i i p X p X X X p X X X Unigram p X p X Bigram p X p X X- = =- = = = = = Probability rule Assume independence Capture some dependence P(x|H CpG ) P(A|H CpG )=0.25 P(T|H CpG )=0.25 P(C|H CpG )=0.25 P(G|H CpG )=0.25 P(x|H Other ) P(A|H Other )=0.25 P(T|H Other )=0.40 P(C|H Other )=0.10 P(G|H Other )=0.25 X=ATTG Vs. X=ATCG 5 Answer to Q2: Hidden Markov Model CpG Island X=ATTGATGCAAAAGGGGGATCGGGCGATATAAAATTTG Other Other How can we identify a CpG island in a long sequence? Idea 1: Test each window of a fixed number of nucleitides Idea2: Classify the whole sequence Class label S1: OOOO.O Class label S2: OOOO. OCC Class label Si: OOOOOCC..COO Class label SN: CCCC.CC S*=argmax S P(S|X) = argmax S P(S,X) S*=OOOOOCC..COO CpG 6 HMM is just one way of modeling p(X,S) 7 A simple HMM Parameters Initial state prob: p(B)= 0.5; p(I)=0.5 State transition prob: p(B B)=0.8 p(B I)=0.2 p(I B)=0.5 p(I I)=0.5 Output prob: P(a|B) = 0.25, p(c|B)=0.10 P(c|I) = 0.25 P(B)=0.5 P(I)=0.5 P(x|B) B I 0.8 0.2 0.5 0.5 P(x|I) 0.8 0.2 0.5 0.5 P(x|H CpG )=p(x|I) P(a|I)=0.25 P(t|I)=0.25 P(c|I)=0.25 P(g|I)=0.25 P(x|H Other )=p(x|B) P(a|B)=0.25 P(t|B)=0.40 P(c|B)=0.10 P(g|B)=0.25 8 ( , , , , ) HMM S V B A = ( ) : " " i k k i b v prob of generating v at s A General Definition of HMM 1 1 { ,..., } 1 N N i i = = = : i i prob of starting at state s 1 { ,..., } M V v v = 1 { ,..., } N S s s = N states M symbols Initial state probability: 1 { } 1 , 1 N ij ij j A a i j N a...
View Full Document

This note was uploaded on 02/13/2012 for the course CS 91.510 taught by Professor Staff during the Fall '09 term at UMass Lowell.

Page1 / 29

hmm - 1 Hidden Markov Models (HMMs) (Lecture for CS397-CXZ...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online