# hw4_sol - Machine Learning (10-701) Fall 2008 Homework #4...

This preview shows pages 1–3. Sign up to view the full content.

Machine Learning (10-701) Fall 2008 Homework #4 Solution Professor: Eric Xing Due Date: November 10, 2008 1 Expectation Maximization (EM) [24 Points, Mark] The expectation maximization (EM) algorithm is one of the most important tools in machine learning. It allows us to create models that include hidden ( latent ) variables. When our models contain parameters that depend on these unknown variables, we cannot compute estimates of these parameters in the usual way. EM gives us a way to estimate these parameters. We will make this more concrete by considering a simple example. Suppose I have two unfair coins. The ﬁrst lands on heads with probability p , and the second lands on heads with probability q . Imagine N tosses, where for each toss I choose to use the ﬁrst coin with probability π and choose to use the second with probability 1 - π . The outcome of each toss i is x i ∈ { 0 , 1 } . Suppose I tell you the outcomes of the N tosses, for example x = { x 1 ,x 2 ,...,x N } , but I don’t tell you which coins I used on which toss. Given only the outcomes, x , your job is to compute estimates for θ which is the set of all parameters, θ = { p,q, and π } . It pretty remarkable that this can be done at all. To compute these estimates, we will create a latent variable Z where z i ∈ { 0 , 1 } indicates the coin used for the n th toss. For example z 2 = 1 indicates the ﬁrst coin was used on the second toss. We deﬁne the incomplete data log-likelihood as log P ( x | θ ) and the complete data log-likelihood as log P ( x,z | θ ). 1. (3 pts) The incomplete log-likelihood of the data is given by log P ( x | θ ) = log (∑ z P ( x,z | θ ) ) . Use Jensen’s inequality to show that a lower bound on the incomplete log-likelihood is given by: log P ( x | θ ) X z g ( z )log ' P ( x,z | θ ) g ( z ) where g ( z ) is an arbitrary probability distribution over the latent variable Z . Solution: log P ( x | θ ) = log ( X z P ( x,z | θ ) ) = log X z g ( z ) P ( x,z | θ ) g ( z ) · = log E g ( z ) [ P ( x,z | θ ) g ( z ) ] · E g ( z ) log [ P ( x,z | θ ) g ( z ) ] · = X z g ( z )log [ P ( x,z | θ ) g ( z ) ] · 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2. (3 pts) Show that log P ( x | θ ) = Q ( g ( z ) ) + KL ( g ( z ) || P ( z | x,θ )) where Q ( g ( z ) ) = X z g ( z )log ' P ( x,z | θ ) g ( z ) KL ( g ( z ) || P ( z | x,θ )) = - X z g ( z )log ' P ( z | x,θ ) g ( z ) Solution: By Bayes rule: log P ( x,z | θ ) = log P ( z | x,θ ) + log P ( x | θ ) Substitute this into Q ( g ( z ) ). One term will cancel the KL term and another will have the log-likelihood of the data. Simply rearrange. 3. (3 pts) Since
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 01/26/2010 for the course MACHINE LE 10701 taught by Professor Ericp.xing during the Fall '08 term at Carnegie Mellon.

### Page1 / 7

hw4_sol - Machine Learning (10-701) Fall 2008 Homework #4...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online