This preview shows page 1. Sign up to view the full content.
Unformatted text preview: p ) = Ep(x1:T ,q1:T λp ) [log p(x1:T , q1:T λ)]
= Ep [log (4.9) p(qt qt−1 , λ)p(xt qt , λ)] (4.10) t (4.12) Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 421 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ HMM  learning with EM
The EM algorithm then repeatedly optimizes the following objective:
f (λ) = Q(λ, λp ) = Ep(x1:T ,q1:T λp ) [log p(x1:T , q1:T λ)]
= Ep [log (4.9) p(qt qt−1 , λ)p(xt qt , λ)] (4.10) t = Ep [ log p(qt qt−1 , λ) +
t log p(xt qt , λ)] (4.11) t (4.12) Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 421 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ HMM  learning with EM
The EM algorithm then repeatedly optimizes the following objective:
f (λ) = Q(λ, λp ) = Ep(x1:T ,q1:T λp ) [log p(x1:T , q1:T λ)]
= Ep [log (4.9) p(qt qt−1 , λ)p(xt qt , λ)] (4.10) t = Ep [ log p(qt qt−1 , λ) +
t log p(xt qt , λ)] (4.11) t p(Qt = j, Qt−1 = ix1:T , λp ) log p(Qt = j Qt−1 = i, λ) =
t ij (4.12) Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 421 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ HMM  learning with EM
The EM algorithm then repeatedly optimizes the following objective:
f (λ) = Q(λ, λp ) = Ep(x1:T ,q1:T λp ) [log p(x1:T , q1:T λ)]
= Ep [log (4.9) p(qt qt−1 , λ)p(xt qt , λ)] (4.10) t = Ep [ log p(qt qt−1 , λ) +
t (4.11) t p(Qt = j, Qt−1 = ix1:T , λp ) log p(Qt = j Qt−1 = i, λ) =
t ij p(Qt = ix1:T , λp ) log p(xt Qt = i, λ) +
t Prof. Jeﬀ Bilmes log p(xt qt , λ)] (4.12) i EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 421 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ HMM  learning with EM
The EM algorithm then repeatedly optimizes the following objective:
f (λ) = Q(λ, λp ) = Ep(x1:T ,q1:T λp ) [log p(x1:T , q1:T λ)]
= Ep [log (4.9) p(qt qt−1 , λ)p(xt qt , λ)] (4.10) t = Ep [ log p(qt qt−1 , λ) +
t log p(xt qt , λ)] (4.11) t p(Qt = j, Qt−1 = ix1:T , λp ) log p(Qt = j Qt−1 = i, λ) =
t ij p(Qt = ix1:T , λp ) log p(xt Qt = i, λ) +
t (4.12) i So this means that for EM learning, we need for all t , the queries
p(Qt = ix1:T ) and p(Qt = j, Qt−1 = ix1:T ) in an HMM. Note
again that these are clique posteriors.
Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 421 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ HMM  learning with gradient descent EM isn’t the only way to learn parameters.
Suppose we wanted to use a gradient descent like algorithm on
f (λ) = log p(x1:T λ), as in
∂
∂
∂
f (λ) =
log p(x1:T λ) =
log
∂λ
∂λ
∂λ
q p(x1:T , q1:T λ) (4.13) 1:T = ∂
∂λ q1:T
q1:T p(x1:T , q1:T λ) p(x1:T , q1:T λ) = ∂
∂λ q1:T p(x1:T , q1:T λ) p(x1:T λ)
(4.14) Prof. Jeﬀ Bilmes EE596A/Winter 2013/DGMs – Lecture 4  Jan 23rd, 2013 page 422 (of 239) HMMs HMMs as GMs Other HMM queries What HMMs can do MPE Summ HMM  learning with gra...
View Full
Document
 Winter '14

Click to edit the document details