This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS229 Problem Set #4 1 CS 229, Autumn 2011 Problem Set #4: Unsupervised learning & RL Due in class (9:30am) on Wednesday, December 7. Notes: (1) These questions require thought, but do not require long answers. Please be as concise as possible. (2) When sending questions to cs229qa@cs.stanford.edu , please make sure to write the homework number and the question number in the subject line, such as Hwk1 Q4 , and send a separate email per question. (3) If you missed the first lecture or are unfamiliar with the class collaboration or honor code policy, please read the policy on Handout #1 (available from the course website) before starting work. (4) For problems that require programming, please include in your submission a printout of your code (with comments) and any figure that you are asked to plot. SCPD students: Please email your solutions to cs229qa@cs.stanford.edu , and write Prob lem Set 4 Submission on the Subject of the email. If you are writing your solutions out by hand, please write clearly and in a reasonably large font using a dark pen to improve legibility. 1. [11 points] EM for MAP estimation The EM algorithm that we talked about in class was for solving a maximum likelihood estimation problem in which we wished to maximize m productdisplay i =1 p ( x ( i ) ; ) = m productdisplay i =1 summationdisplay z ( i ) p ( x ( i ) , z ( i ) ; ) , where the z ( i ) s were latent random variables. Suppose we are working in a Bayesian framework, and wanted to find the MAP estimate of the parameters by maximizing parenleftBigg m productdisplay i =1 p ( x ( i )  ) parenrightBigg p ( ) = parenleftBigg m productdisplay i =1 summationdisplay z ( i ) p ( x ( i ) , z ( i )  ) parenrightBigg p ( ) . Here, p ( ) is our prior on the parameters. Generalize the EM algorithm to work for MAP estimation. You may assume that log p ( x, z  ) and log p ( ) are both concave in , so that the Mstep is tractable if it requires only maximizing a linear combination of these quantities. (This roughly corresponds to assuming that MAP estimation is tractable when x, z is fully observed, just like in the frequentist case where we considered examples in which maximum likelihood estimation was easy if x, z was fully observed.) Make sure your Mstep is tractable, and also prove that producttext m i =1 p ( x ( i )  ) p ( ) (viewed as a function of ) monotonically increases with each iteration of your algorithm. 2. [22 points] EM application Consider the following problem. There are P papers submitted to a machine learning conference. Each of R reviewers reads each paper, and gives it a score indicating how good he/she thought that paper was. We let x ( pr ) denote the score that reviewer r gave to paper p . A high score means the reviewer liked the paper, and represents a recommendation from that reviewer that it be accepted for the conference. A low score means the reviewer did not like the paper. CS229 Problem Set #4...
View
Full
Document
 Fall '09

Click to edit the document details