This preview shows page 1. Sign up to view the full content.
Unformatted text preview: rd, 2013 page 457 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.”
The Viterbi path (mostprobable explanation) of an HMM is deﬁned:
∗
q1:T = argmax q1:T p(X1:T = x1:T , q1:T ) If we maxmarginalize over the hidden states, does that lead to i.i.d.
distribution?
The “Viterbi” distribution of the HMM is:
pvit (X1:T = x1:T )
= c p(X1:T = x1:T , Q1:T = (4.43)
∗
q1:T ) = c max p(X1:T = x1:T , Q1:T = q1:T )
q1:T T p(Xt = xt Qt = qt )p(Qt = qt Qt−1 = qt−1 ) = c max
q1:T t=1 where c is a positive normalizing constant over x1:T .
This is just a diﬀerent semiring. The resulting distribution over
observations does not in general factorize, so no i.i.d. here either.
Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 457 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.”
The Viterbi path (mostprobable explanation) of an HMM is deﬁned
as follows:
∗
q1:T ∈ argmax q1:T p(X1:T = x1:T , q1:T ) Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 (4.44) page 458 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.”
The Viterbi path (mostprobable explanation) of an HMM is deﬁned
as follows:
∗
q1:T ∈ argmax q1:T p(X1:T = x1:T , q1:T ) (4.44) This is a standard method for ﬁnding a mapping from observations
∗
x1:T to candidate answers q1:T , also called a “decoding.”
¯ Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 458 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.”
The Viterbi path (mostprobable explanation) of an HMM is deﬁned
as follows:
∗
q1:T ∈ argmax q1:T p(X1:T = x1:T , q1:T ) (4.44) This is a standard method for ﬁnding a mapping from observations
∗
x1:T to candidate answers q1:T , also called a “decoding.”
¯
Other times, we might have k HMMs, corresponding to k classes
{pk (x1:T , q1:T ) : k = 1 . . . K } (4.45) and we wish to perform classiﬁcation amongst the K discrete set of
classes as follows
∗
(k ∗ , q1:T ) = argmax pk (x1:T , q1:T ) (4.46) q1:T ,k and k ∗ becomes part of the hypothesized answer.
Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 458 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.”
We can view this as K distributions over just the observations, i.e.,
∗
pk (x1:T ) ∝ pk (x1:T , q1:,k )
vit
T (4.47) ∝ max pk (x1:T , q1:T ) (4.48) q1:T T ∝ max
q1:T Prof. Jeﬀ Bilmes pk (xt qt )pk (qt qt−1 ) (4.49) t=1 EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 459 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.”
We can view this as K distributions over just the observations, i.e.,
∗
pk (x1:T ) ∝ pk (x1:T , q1:,k )
vit
T (4.47) ∝ max pk (x1:T , q1:T ) (4.48) q1:T T ∝ max
q1:T pk (xt qt )pk (...
View
Full
Document
 Winter '14

Click to edit the document details