# We expand ih xt qt xt in two ways using the chain rule

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: X¬t ; Xt ) + Ih (X¬t ; Qt |Xt ) (5.66) The HMM conditional independence properties say that Ih (X¬t ; Xt |Qt ) = 0, implying Ih (X¬t ; Qt ) = I (X¬t ; Xt ) + Ih (X¬t ; Qt |Xt ) Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 (5.68) page 5-75 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch Proof cont.: HMMs Generative Accuracy ... cont. or that Ih (X¬t ; Qt ) ≥ I (X¬t ; Xt ) (5.69) since Ih (X¬t ; Qt |Xt ) ≥ 0. This is the ﬁrst condition. Similarly, the quantity Ih (Xt ; Qt , X¬t ) may be expanded as follows: (5.70) Ih (Xt ; Qt , X¬t ) = Ih (Xt ; Qt ) + Ih (Xt ; X¬t |Qt ) (5.71) = I (Xt ; X¬t ) + Ih (Xt ; Qt |X¬t ) (5.72) Reasoning as above, this leads to Ih (Xt ; Qt ) ≥ I (Xt ; X¬t ), (5.73) the second condition. Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 page 5-76 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch Proof cont.: HMMs Generative Accuracy ... cont. A sequence of inequalities establishes the third condition: log |DQ | ≥ H (Qt ) ≥ H (Qt ) − H (Qt |Xt ) = Ih (Qt ; Xt ) ≥ I (Xt ; X¬t ) so |DQ | ≥ 2I (Xt ;X¬t ) . This is a lower bound - the number of states must have enough capacity so that it is not a bottleneck, at the very least! This could be quite large, and grow with T . r.h.s. I (Xt ; X¬t ) is upper bounded by H (X¬t ) which could be as bad as log |DX¬t | Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 page 5-77 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch Nec. conditions for HMMs Generative Accuracy Insuﬃcient states can lead to model inaccuracies (e.g., state duration distribution using a geometric rather than something more realistic, add states to improve duration distribution while sharing observation parameters) Observation density family must be rich enough (2nd inequality) Two bottlenecks: observation density (e.g., number of Components of a Gaussian mixture), and time-dependency (number of states). Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 page 5-78 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch Suﬀ conditions for HMMs Generative Accuracy Theorem 5.8.2 Suﬃcient conditions for HMM accuracy. An HMM ph (X1:T ) will accurately represent a true discrete distribution p(X1:T ) if the following conditions hold for all t: H (Qt |X<t ) = 0 ph (Xt = xt |qx<t ) = p(Xt = xt |X<t = x<t ). where qx<t = f (x<t ) is the unique state sub-sequence associated with x<t . Quite strong and unrealistic requirements, but they guarantee accuracy nonetheless. ∆ Note {< t} = {1, 2, . . . , t − 1} Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 5 - Jan 25th, 2013 page 5-79 (of 232) HMMs Trellis Other HMM queries MPE Sampling What HMMs can do Summary Scratch Proof: Suﬀ conds for HMMs Generative Accuracy Proof. We have for all t: D(p(Xt |X<t )||ph (Xt |X<t )) p(xt |x<t ) = p(x1:t ) log ph (xt |x<t ) x (5.74) (5.75) 1:t = p(x1:t ) log x1:t p(xt |...
View Full Document

## This document was uploaded on 04/05/2014.

Ask a homework question - tutors are online