# hw4_sol - Machine Learning(10-701 Fall 2008 Homework#4...

This preview shows pages 1–3. Sign up to view the full content.

Machine Learning (10-701) Fall 2008 Homework #4 Solution Professor: Eric Xing Due Date: November 10, 2008 1 Expectation Maximization (EM) [24 Points, Mark] The expectation maximization (EM) algorithm is one of the most important tools in machine learning. It allows us to create models that include hidden ( latent ) variables. When our models contain parameters that depend on these unknown variables, we cannot compute estimates of these parameters in the usual way. EM gives us a way to estimate these parameters. We will make this more concrete by considering a simple example. Suppose I have two unfair coins. The first lands on heads with probability p , and the second lands on heads with probability q . Imagine N tosses, where for each toss I choose to use the first coin with probability π and choose to use the second with probability 1 - π . The outcome of each toss i is x i ∈ { 0 , 1 } . Suppose I tell you the outcomes of the N tosses, for example x = { x 1 , x 2 , . . . , x N } , but I don’t tell you which coins I used on which toss. Given only the outcomes, x , your job is to compute estimates for θ which is the set of all parameters, θ = { p, q, and π } . It pretty remarkable that this can be done at all. To compute these estimates, we will create a latent variable Z where z i ∈ { 0 , 1 } indicates the coin used for the n th toss. For example z 2 = 1 indicates the first coin was used on the second toss. We define the incomplete data log-likelihood as log P ( x | θ ) and the complete data log-likelihood as log P ( x, z | θ ). 1. (3 pts) The incomplete log-likelihood of the data is given by log P ( x | θ ) = log ( ∑ z P ( x, z | θ ) ) . Use Jensen’s inequality to show that a lower bound on the incomplete log-likelihood is given by: log P ( x | θ ) X z g ( z )log ' P ( x, z | θ ) g ( z ) where g ( z ) is an arbitrary probability distribution over the latent variable Z . Solution: log P ( x | θ ) = log ( X z P ( x, z | θ ) ) = log X z g ( z ) P ( x, z | θ ) g ( z ) · = log E g ( z ) [ P ( x, z | θ ) g ( z ) ] · E g ( z ) log [ P ( x, z | θ ) g ( z ) ] · = X z g ( z )log [ P ( x, z | θ ) g ( z ) ] · 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2. (3 pts) Show that log P ( x | θ ) = Q ( g ( z ) , θ ) + KL ( g ( z ) || P ( z | x, θ )) where Q ( g ( z ) , θ ) = X z g ( z )log ' P ( x, z | θ ) g ( z ) KL ( g ( z ) || P ( z | x, θ )) = - X z g ( z )log ' P ( z | x, θ ) g ( z ) Solution: By Bayes rule: log P ( x, z | θ ) = log P ( z | x, θ ) + log P ( x | θ ) Substitute this into Q ( g ( z ) , θ ). One term will cancel the KL term and another will have the log-likelihood of the data. Simply rearrange.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern