This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CS229 Problem Set #4 1 CS 229, Autumn 2011 Problem Set #4: Unsupervised learning & RL Due in class (9:30am) on Wednesday, December 7. Notes: (1) These questions require thought, but do not require long answers. Please be as concise as possible. (2) When sending questions to firstname.lastname@example.org , please make sure to write the homework number and the question number in the subject line, such as Hwk1 Q4 , and send a separate email per question. (3) If you missed the first lecture or are unfamiliar with the class collaboration or honor code policy, please read the policy on Handout #1 (available from the course website) before starting work. (4) For problems that require programming, please include in your submission a printout of your code (with comments) and any figure that you are asked to plot. SCPD students: Please email your solutions to email@example.com , and write Prob- lem Set 4 Submission on the Subject of the email. If you are writing your solutions out by hand, please write clearly and in a reasonably large font using a dark pen to improve legibility. 1. [11 points] EM for MAP estimation The EM algorithm that we talked about in class was for solving a maximum likelihood estimation problem in which we wished to maximize m productdisplay i =1 p ( x ( i ) ; ) = m productdisplay i =1 summationdisplay z ( i ) p ( x ( i ) , z ( i ) ; ) , where the z ( i ) s were latent random variables. Suppose we are working in a Bayesian framework, and wanted to find the MAP estimate of the parameters by maximizing parenleftBigg m productdisplay i =1 p ( x ( i ) | ) parenrightBigg p ( ) = parenleftBigg m productdisplay i =1 summationdisplay z ( i ) p ( x ( i ) , z ( i ) | ) parenrightBigg p ( ) . Here, p ( ) is our prior on the parameters. Generalize the EM algorithm to work for MAP estimation. You may assume that log p ( x, z | ) and log p ( ) are both concave in , so that the M-step is tractable if it requires only maximizing a linear combination of these quantities. (This roughly corresponds to assuming that MAP estimation is tractable when x, z is fully observed, just like in the frequentist case where we considered examples in which maximum likelihood estimation was easy if x, z was fully observed.) Make sure your M-step is tractable, and also prove that producttext m i =1 p ( x ( i ) | ) p ( ) (viewed as a function of ) monotonically increases with each iteration of your algorithm. 2. [22 points] EM application Consider the following problem. There are P papers submitted to a machine learning conference. Each of R reviewers reads each paper, and gives it a score indicating how good he/she thought that paper was. We let x ( pr ) denote the score that reviewer r gave to paper p . A high score means the reviewer liked the paper, and represents a recommendation from that reviewer that it be accepted for the conference. A low score means the reviewer did not like the paper. CS229 Problem Set #4...
View Full Document
- Fall '09