This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS229 Problem Set #4 1 CS 229, Autumn 2011 Problem Set #4: Unsupervised learning & RL Due in class (9:30am) on Wednesday, December 7. Notes: (1) These questions require thought, but do not require long answers. Please be as concise as possible. (2) When sending questions to [email protected] , please make sure to write the homework number and the question number in the subject line, such as Hwk1 Q4 , and send a separate email per question. (3) If you missed the first lecture or are unfamiliar with the class’ collaboration or honor code policy, please read the policy on Handout #1 (available from the course website) before starting work. (4) For problems that require programming, please include in your submission a printout of your code (with comments) and any figure that you are asked to plot. SCPD students: Please email your solutions to [email protected] , and write “Prob lem Set 4 Submission” on the Subject of the email. If you are writing your solutions out by hand, please write clearly and in a reasonably large font using a dark pen to improve legibility. 1. [11 points] EM for MAP estimation The EM algorithm that we talked about in class was for solving a maximum likelihood estimation problem in which we wished to maximize m productdisplay i =1 p ( x ( i ) ; θ ) = m productdisplay i =1 summationdisplay z ( i ) p ( x ( i ) , z ( i ) ; θ ) , where the z ( i ) ’s were latent random variables. Suppose we are working in a Bayesian framework, and wanted to find the MAP estimate of the parameters θ by maximizing parenleftBigg m productdisplay i =1 p ( x ( i )  θ ) parenrightBigg p ( θ ) = parenleftBigg m productdisplay i =1 summationdisplay z ( i ) p ( x ( i ) , z ( i )  θ ) parenrightBigg p ( θ ) . Here, p ( θ ) is our prior on the parameters. Generalize the EM algorithm to work for MAP estimation. You may assume that log p ( x, z  θ ) and log p ( θ ) are both concave in θ , so that the Mstep is tractable if it requires only maximizing a linear combination of these quantities. (This roughly corresponds to assuming that MAP estimation is tractable when x, z is fully observed, just like in the frequentist case where we considered examples in which maximum likelihood estimation was easy if x, z was fully observed.) Make sure your Mstep is tractable, and also prove that producttext m i =1 p ( x ( i )  θ ) p ( θ ) (viewed as a function of θ ) monotonically increases with each iteration of your algorithm. 2. [22 points] EM application Consider the following problem. There are P papers submitted to a machine learning conference. Each of R reviewers reads each paper, and gives it a score indicating how good he/she thought that paper was. We let x ( pr ) denote the score that reviewer r gave to paper p . A high score means the reviewer liked the paper, and represents a recommendation from that reviewer that it be accepted for the conference. A low score means the reviewer did not like the paper. CS229 Problem Set #4...
View
Full Document
 Fall '09
 Normal Distribution, Machine Learning, Maximum likelihood, Estimation theory

Click to edit the document details