CS229 Problem Set #4 Solutions
1
CS 229, Autumn 2011
Problem Set #4 Solutions: Unsupervised learning
& RL
Due in class (9:30am) on Wednesday, December 7.
Notes:
(1) These questions require thought, but do not require long answers.
Please be as
concise as possible. (2) When sending questions to
[email protected]
, please make
sure to write the homework number and the question number in the subject line, such as
Hwk1 Q4
,
and send a separate email per question. (3) If you missed the first lecture or are unfamiliar with
the class’ collaboration or honor code policy, please read the policy on Handout #1 (available
from the course website) before starting work.
(4) For problems that require programming,
please include in your submission a printout of your code (with comments) and any figure that
you are asked to plot.
SCPD students:
Please email your solutions to
[email protected]
, and write “Prob
lem Set 4 Submission” on the Subject of the email.
If you are writing your solutions out by
hand, please write clearly and in a reasonably large font using a dark pen to improve legibility.
1.
[11 points] EM for MAP estimation
The EM algorithm that we talked about in class was for solving a maximum likelihood
estimation problem in which we wished to maximize
m
productdisplay
i
=1
p
(
x
(
i
)
;
θ
) =
m
productdisplay
i
=1
summationdisplay
z
(
i
)
p
(
x
(
i
)
, z
(
i
)
;
θ
)
,
where the
z
(
i
)
’s were latent random variables.
Suppose we are working in a Bayesian
framework, and wanted to find the MAP estimate of the parameters
θ
by maximizing
parenleftBigg
m
productdisplay
i
=1
p
(
x
(
i
)

θ
)
parenrightBigg
p
(
θ
) =
parenleftBigg
m
productdisplay
i
=1
summationdisplay
z
(
i
)
p
(
x
(
i
)
, z
(
i
)

θ
)
parenrightBigg
p
(
θ
)
.
Here,
p
(
θ
) is our prior on the parameters. Generalize the EM algorithm to work for MAP
estimation.
You may assume that log
p
(
x, z

θ
) and log
p
(
θ
) are both concave in
θ
, so
that the Mstep is tractable if it requires only maximizing a linear combination of these
quantities. (This roughly corresponds to assuming that MAP estimation is tractable when
x, z
is fully observed, just like in the frequentist case where we considered examples in
which maximum likelihood estimation was easy if
x, z
was fully observed.)
Make sure your Mstep is tractable, and also prove that
producttext
m
i
=1
p
(
x
(
i
)

θ
)
p
(
θ
) (viewed as a
function of
θ
) monotonically increases with each iteration of your algorithm.
Answer:
We will derive the EM updates the same way as done in class for maximum
likelihood estimation.
Monotonic increase with every iteration is guaranteed because of the
same reason: in the Estep we compute a lower bound that is tight at the current estimate
of
θ
, in the Mstep we optimize
θ
for this lower bound, so we are guaranteed to improve the
actual objective function.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
CS229 Problem Set #4 Solutions
2
log
m
productdisplay
i
=1
p
(
x
(
i
)

θ
)
p
(
θ
)
=
log
p
(
θ
) +
m
summationdisplay
i
=1
log
p
(
x
(
i
)

θ
)
=
log
p
(
θ
) +
m
summationdisplay
i
=1
log
summationdisplay
z
(
i
)
p
(
x
(
i
)
, z
(
i
)

θ
)
=
log
p
(
θ
) +
m
summationdisplay
i
=1
log
summationdisplay
z
(
i
)
Q
i
(
z
(
i
)
)
p
(
x
(
i
)
, z
(
i
)

θ
)
Q
i
(
z
(
i
)
)
≥
log
p
(
θ
) +
m
summationdisplay
i
=1
This is the end of the preview.
Sign up
to
access the rest of the document.