CS229 Problem Set #4
1
CS 229, Autumn 2009
Due in class (9:30am) on Wednesday, December 2.
Notes:
(1) These questions require thought, but do not require long answers. Please be as
concise as possible. (2) When sending questions to
[email protected]
, please make sure
to write the homework number and the question number in the subject line, such as
Hwk1 Q4
,
and send a separate email per question. (3) If you missed the Frst lecture or are unfamiliar with
the class’ collaboration or honor code policy, please read the policy on Handout #1 (available
from the course website) before starting work. (4) ±or problems that require programming,
please include in your submission a printout of your code (with comments) and any Fgure that
you are asked to plot.
SCPD students:
Please fax your solutions to Prof. Ng at (650) 7251449, and write “ATTN:
CS229 (Machine Learning)” on the cover sheet. If you are writing your solutions out by hand,
please write clearly and in a reasonably large font using a dark pen to improve legibility.
1.
[11 points] EM for MAP estimation
The EM algorithm that we talked about in class was for solving a maximum likelihood
estimation problem in which we wished to maximize
m
p
i
=1
p
(
x
(
i
)
;
θ
) =
m
p
i
=1
s
z
(
i
)
p
(
x
(
i
)
, z
(
i
)
;
θ
)
,
where the
z
(
i
)
’s were latent random variables. Suppose we are working in a Bayesian
framework, and wanted to Fnd the MAP estimate of the parameters
θ
by maximizing
P
m
p
i
=1
p
(
x
(
i
)

θ
)
±
p
(
θ
) =
P
m
p
i
=1
s
z
(
i
)
p
(
x
(
i
)
, z
(
i
)

θ
)
±
p
(
θ
)
.
Here,
p
(
θ
) is our prior on the parameters. Generalize the EM algorithm to work for MAP
estimation. You may assume that log
p
(
x, z

θ
) and log
p
(
θ
) are both concave in
θ
, so
that the Mstep is tractable if it requires only maximizing a linear combination of these
quantities. (This roughly corresponds to assuming that MAP estimation is tractable when
x, z
is fully observed, just like in the frequentist case where we considered examples in
which maximum likelihood estimation was easy if
x, z
was fully observed.)
Make sure your Mstep is tractable, and also prove that
²
m
i
=1
p
(
x
(
i
)

θ
)
p
(
θ
) (viewed as a
function of
θ
) monotonically increases with each iteration of your algorithm.
2.
[22 points] EM application
Consider the following problem. There are
P
papers submitted to a machine learning
conference. Each of
R
reviewers reads each paper, and gives it a score indicating how good
he/she thought that paper was. We let
x
(
pr
)
denote the score that reviewer
r
gave to paper
p
. A high score means the reviewer liked the paper, and represents a recommendation from
that reviewer that it be accepted for the conference. A low score means the reviewer did
not like the paper.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentCS229 Problem Set #4
2
We imagine that each paper has some “intrinsic,” true value that we denote by
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 AndrewNg
 Normal Distribution, Machine Learning, Maximum likelihood, Estimation theory

Click to edit the document details