# ps4 - CS229 Problem Set #4 1 CS 229, Autumn 2009 Problem...

This preview shows pages 1–3. Sign up to view the full content.

CS229 Problem Set #4 1 CS 229, Autumn 2009 Due in class (9:30am) on Wednesday, December 2. Notes: (1) These questions require thought, but do not require long answers. Please be as concise as possible. (2) When sending questions to [email protected] , please make sure to write the homework number and the question number in the subject line, such as Hwk1 Q4 , and send a separate email per question. (3) If you missed the Frst lecture or are unfamiliar with the class’ collaboration or honor code policy, please read the policy on Handout #1 (available from the course website) before starting work. (4) ±or problems that require programming, please include in your submission a printout of your code (with comments) and any Fgure that you are asked to plot. SCPD students: Please fax your solutions to Prof. Ng at (650) 725-1449, and write “ATTN: CS229 (Machine Learning)” on the cover sheet. If you are writing your solutions out by hand, please write clearly and in a reasonably large font using a dark pen to improve legibility. 1. [11 points] EM for MAP estimation The EM algorithm that we talked about in class was for solving a maximum likelihood estimation problem in which we wished to maximize m p i =1 p ( x ( i ) ; θ ) = m p i =1 s z ( i ) p ( x ( i ) , z ( i ) ; θ ) , where the z ( i ) ’s were latent random variables. Suppose we are working in a Bayesian framework, and wanted to Fnd the MAP estimate of the parameters θ by maximizing P m p i =1 p ( x ( i ) | θ ) ± p ( θ ) = P m p i =1 s z ( i ) p ( x ( i ) , z ( i ) | θ ) ± p ( θ ) . Here, p ( θ ) is our prior on the parameters. Generalize the EM algorithm to work for MAP estimation. You may assume that log p ( x, z | θ ) and log p ( θ ) are both concave in θ , so that the M-step is tractable if it requires only maximizing a linear combination of these quantities. (This roughly corresponds to assuming that MAP estimation is tractable when x, z is fully observed, just like in the frequentist case where we considered examples in which maximum likelihood estimation was easy if x, z was fully observed.) Make sure your M-step is tractable, and also prove that ² m i =1 p ( x ( i ) | θ ) p ( θ ) (viewed as a function of θ ) monotonically increases with each iteration of your algorithm. 2. [22 points] EM application Consider the following problem. There are P papers submitted to a machine learning conference. Each of R reviewers reads each paper, and gives it a score indicating how good he/she thought that paper was. We let x ( pr ) denote the score that reviewer r gave to paper p . A high score means the reviewer liked the paper, and represents a recommendation from that reviewer that it be accepted for the conference. A low score means the reviewer did not like the paper.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
CS229 Problem Set #4 2 We imagine that each paper has some “intrinsic,” true value that we denote by
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 05/06/2010 for the course CS 229 taught by Professor Andrewng during the Spring '09 term at Stratford.

### Page1 / 6

ps4 - CS229 Problem Set #4 1 CS 229, Autumn 2009 Problem...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online