This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Xt+h )
=
ij
h −
→ ij (5.54) µi µj (Ah )ij πi −
µi µj πj πi − µi π i
i µi π i
i (5.55) µi π i i µi π i (5.56) =0 i Thus, while the covariance between to observations is not
necessarily zero in an HMM, once we are at a stationary
distribution, this covariance goes to zero exponentially fast.
Exercise: For this property to hold, must we assume a unique
stationary distribution here?
Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 5  Jan 25th, 2013 page 564 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch Correlation over time of simple HMM
Example of the decay in the mutualinformation correlation from a
realworld HMM. I.e., we see f (τ ) = I (Xt ; Xt+τ ), where I () is the
mutual information function.
Mutual information is stronger than correlation.
This is compared against i.i.d. samples (high peak at τ = 0 is
expected).
−3 14 x 10 HMM MI
IID MI 13
12 MI (bits) 11
10
9
8
7
6
5
−100 Prof. Jeﬀ Bilmes −50 0
time (ms) 50 100 EE596A/Winter 2013/DGMs – Lecture 5  Jan 25th, 2013 page 565 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch State Duration Modeling
Markov chain, state duration distribution is geometric
Let D be such a random variable, then
P (D = d) = pd−1 (1 − p) (5.57) where d ≥ 1 is an integer and p = aii , if D is the duration random
variable for state i of the chain, giving:
0.25
0.2 0.75
0.15 P(d) 0.25 0.1 0.05
0 10 20 30
d 40 50 60 Many sequential tasks have subsegments that do not follow this
distribution
Prof. Jeﬀ Bilmes
page 566 (of 232)
this seems like EE596A/Winter 2013/DGMs – Lecture 5  Jan 25th, 2013
an inherent limitation. HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch State Duration Modeling
Many “tricks” to using an HMM, can alleviate such problems.
“statetying”, where multiple states have the same observation
distribution (parameters are shared).
That is, state q and q are tied if it is the case that
p(xQt = q ) = p(xQt = q ) ∀x ∈ DX (5.58) If n states in a series are strung together, all of which share the
same observation distribution, that observation distribution will be
active for as long as we are in that state. For example:
0.08 0.75 0.75
0.25 0.75
0.25 0.06 0.75
0.25 0.25 0.04 0.02 0 Prof. Jeﬀ Bilmes 10 20 EE596A/Winter 2013/DGMs – Lecture 5  Jan 25th, 2013 30
d 40 50 page 60 567 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch State Duration Modeling
This corresponds to the sum of random variables. Let {Di }i be a
collection of independent geometrically distributed random variables
with parameter p, and let Wr = r=1 Dr , then
i
k−1 r
p (1 − p)k−r , k = r, r + 1, . . .
r−1 p(Wr = k ) = (5.59) This is a “negative binomial distribution”:
0.08 0.06 0.04 0.02 0
Prof. Jeﬀ Bilmes 10 20 30 40 50 60 d
EE596A/Winter 2013/DGMs – Lecture 5  Jan 25th, 2013 page 568 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch State Duration Modeling
If we have multiple parallel states in series, all of which share the
same observation distribution, we can construct much more
interesting (multimodal) distributions. Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 5  Jan 25th, 2013 page 569 (of 232) HMMs Trellis Other HMM queries MPE Sampl...
View
Full
Document
 Winter '14

Click to edit the document details