# This is just a dierent semi ring the resulting

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: rd, 2013 page 4-57 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.” The Viterbi path (most-probable explanation) of an HMM is deﬁned: ∗ q1:T = argmax q1:T p(X1:T = x1:T , q1:T ) If we max-marginalize over the hidden states, does that lead to i.i.d. distribution? The “Viterbi” distribution of the HMM is: pvit (X1:T = x1:T ) = c p(X1:T = x1:T , Q1:T = (4.43) ∗ q1:T ) = c max p(X1:T = x1:T , Q1:T = q1:T ) q1:T T p(Xt = xt |Qt = qt )p(Qt = qt |Qt−1 = qt−1 ) = c max q1:T t=1 where c is a positive normalizing constant over x1:T . This is just a diﬀerent semi-ring. The resulting distribution over observations does not in general factorize, so no i.i.d. here either. Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4 - Jan 23rd, 2013 page 4-57 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.” The Viterbi path (most-probable explanation) of an HMM is deﬁned as follows: ∗ q1:T ∈ argmax q1:T p(X1:T = x1:T , q1:T ) Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4 - Jan 23rd, 2013 (4.44) page 4-58 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.” The Viterbi path (most-probable explanation) of an HMM is deﬁned as follows: ∗ q1:T ∈ argmax q1:T p(X1:T = x1:T , q1:T ) (4.44) This is a standard method for ﬁnding a mapping from observations ∗ x1:T to candidate answers q1:T , also called a “decoding.” ¯ Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4 - Jan 23rd, 2013 page 4-58 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.” The Viterbi path (most-probable explanation) of an HMM is deﬁned as follows: ∗ q1:T ∈ argmax q1:T p(X1:T = x1:T , q1:T ) (4.44) This is a standard method for ﬁnding a mapping from observations ∗ x1:T to candidate answers q1:T , also called a “decoding.” ¯ Other times, we might have k HMMs, corresponding to k classes {pk (x1:T , q1:T ) : k = 1 . . . K } (4.45) and we wish to perform classiﬁcation amongst the K discrete set of classes as follows ∗ (k ∗ , q1:T ) = argmax pk (x1:T , q1:T ) (4.46) q1:T ,k and k ∗ becomes part of the hypothesized answer. Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4 - Jan 23rd, 2013 page 4-58 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.” We can view this as K distributions over just the observations, i.e., ∗ pk (x1:T ) ∝ pk (x1:T , q1:,k ) vit T (4.47) ∝ max pk (x1:T , q1:T ) (4.48) q1:T T ∝ max q1:T Prof. Jeﬀ Bilmes pk (xt |qt )pk (qt |qt−1 ) (4.49) t=1 EE596A/Winter 2013/DGMs – Lecture 4 - Jan 23rd, 2013 page 4-59 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ Observations are not “Viterbi i.i.d.” We can view this as K distributions over just the observations, i.e., ∗ pk (x1:T ) ∝ pk (x1:T , q1:,k ) vit T (4.47) ∝ max pk (x1:T , q1:T ) (4.48) q1:T T ∝ max q1:T pk (xt |qt )pk (...
View Full Document

## This document was uploaded on 04/05/2014.

Ask a homework question - tutors are online