This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS229 Problem Set #4 Solutions 1 CS 229, Autumn 2011 Problem Set #4 Solutions: Unsupervised learning &amp; RL Due in class (9:30am) on Wednesday, December 7. Notes: (1) These questions require thought, but do not require long answers. Please be as concise as possible. (2) When sending questions to cs229qa@cs.stanford.edu , please make sure to write the homework number and the question number in the subject line, such as Hwk1 Q4 , and send a separate email per question. (3) If you missed the first lecture or are unfamiliar with the class collaboration or honor code policy, please read the policy on Handout #1 (available from the course website) before starting work. (4) For problems that require programming, please include in your submission a printout of your code (with comments) and any figure that you are asked to plot. SCPD students: Please email your solutions to cs229qa@cs.stanford.edu , and write Prob lem Set 4 Submission on the Subject of the email. If you are writing your solutions out by hand, please write clearly and in a reasonably large font using a dark pen to improve legibility. 1. [11 points] EM for MAP estimation The EM algorithm that we talked about in class was for solving a maximum likelihood estimation problem in which we wished to maximize m productdisplay i =1 p ( x ( i ) ; ) = m productdisplay i =1 summationdisplay z ( i ) p ( x ( i ) ,z ( i ) ; ) , where the z ( i ) s were latent random variables. Suppose we are working in a Bayesian framework, and wanted to find the MAP estimate of the parameters by maximizing parenleftBigg m productdisplay i =1 p ( x ( i )  ) parenrightBigg p ( ) = parenleftBigg m productdisplay i =1 summationdisplay z ( i ) p ( x ( i ) ,z ( i )  ) parenrightBigg p ( ) . Here, p ( ) is our prior on the parameters. Generalize the EM algorithm to work for MAP estimation. You may assume that log p ( x,z  ) and log p ( ) are both concave in , so that the Mstep is tractable if it requires only maximizing a linear combination of these quantities. (This roughly corresponds to assuming that MAP estimation is tractable when x,z is fully observed, just like in the frequentist case where we considered examples in which maximum likelihood estimation was easy if x,z was fully observed.) Make sure your Mstep is tractable, and also prove that producttext m i =1 p ( x ( i )  ) p ( ) (viewed as a function of ) monotonically increases with each iteration of your algorithm. Answer: We will derive the EM updates the same way as done in class for maximum likelihood estimation. Monotonic increase with every iteration is guaranteed because of the same reason: in the Estep we compute a lower bound that is tight at the current estimate of , in the Mstep we optimize for this lower bound, so we are guaranteed to improve the actual objective function. CS229 Problem Set #4 Solutions 2 log m productdisplay i =1 p ( x ( i )  ) p ( ) = log p ( ) + m summationdisplay...
View
Full
Document
This document was uploaded on 01/06/2012.
 Fall '09

Click to edit the document details