# H but recall ah 1 from lecture 3 and this is a matrix

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Xt+h ) = ij h − → ij (5.54) µi µj (Ah )ij πi − µi µj πj πi − µi π i i µi π i i (5.55) µi π i i µi π i (5.56) =0 i Thus, while the covariance between to observations is not necessarily zero in an HMM, once we are at a stationary distribution, this covariance goes to zero exponentially fast. Exercise: For this property to hold, must we assume a unique stationary distribution here? Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 page 5-64 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch Correlation over time of simple HMM Example of the decay in the mutual-information correlation from a real-world HMM. I.e., we see f (τ ) = I (Xt ; Xt+τ ), where I () is the mutual information function. Mutual information is stronger than correlation. This is compared against i.i.d. samples (high peak at τ = 0 is expected). −3 14 x 10 HMM MI IID MI 13 12 MI (bits) 11 10 9 8 7 6 5 −100 Prof. Jeﬀ Bilmes −50 0 time (ms) 50 100 EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 page 5-65 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch State Duration Modeling Markov chain, state duration distribution is geometric Let D be such a random variable, then P (D = d) = pd−1 (1 − p) (5.57) where d ≥ 1 is an integer and p = aii , if D is the duration random variable for state i of the chain, giving: 0.25 0.2 0.75 0.15 P(d) 0.25 0.1 0.05 0 10 20 30 d 40 50 60 Many sequential tasks have sub-segments that do not follow this distribution Prof. Jeﬀ Bilmes page 5-66 (of 232) this seems like EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 an inherent limitation. HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch State Duration Modeling Many “tricks” to using an HMM, can alleviate such problems. “state-tying”, where multiple states have the same observation distribution (parameters are shared). That is, state q and q are tied if it is the case that p(x|Qt = q ) = p(x|Qt = q ) ∀x ∈ DX (5.58) If n states in a series are strung together, all of which share the same observation distribution, that observation distribution will be active for as long as we are in that state. For example: 0.08 0.75 0.75 0.25 0.75 0.25 0.06 0.75 0.25 0.25 0.04 0.02 0 Prof. Jeﬀ Bilmes 10 20 EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 30 d 40 50 page 60 5-67 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch State Duration Modeling This corresponds to the sum of random variables. Let {Di }i be a collection of independent geometrically distributed random variables with parameter p, and let Wr = r=1 Dr , then i k−1 r p (1 − p)k−r , k = r, r + 1, . . . r−1 p(Wr = k ) = (5.59) This is a “negative binomial distribution”: 0.08 0.06 0.04 0.02 0 Prof. Jeﬀ Bilmes 10 20 30 40 50 60 d EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 page 5-68 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch State Duration Modeling If we have multiple parallel states in series, all of which share the same observation distribution, we can construct much more interesting (multimodal) distributions. Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 page 5-69 (of 232) HMMs Trellis Other HMM queries MPE Sampl...
View Full Document

Ask a homework question - tutors are online