Machine Learning (10701)
Fall 2008
Homework #4 Solution
Professor: Eric Xing
Due Date: November 10, 2008
1 Expectation Maximization (EM) [24 Points, Mark]
The
expectation maximization
(EM) algorithm is one of the most important tools in machine
learning. It allows us to create models that include
hidden
(
latent
) variables. When our models
contain parameters that depend on these unknown variables, we cannot compute estimates of these
parameters in the usual way. EM gives us a way to estimate these parameters.
We will make this more concrete by considering a simple example. Suppose I have two unfair coins.
The ﬁrst lands on heads with probability
p
, and the second lands on heads with probability
q
.
Imagine
N
tosses, where for each toss I choose to use the ﬁrst coin with probability
π
and choose
to use the second with probability 1

π
. The outcome of each toss
i
is
x
i
∈ {
0
,
1
}
. Suppose I tell
you the outcomes of the
N
tosses, for example
x
=
{
x
1
,x
2
,...,x
N
}
, but I don’t tell you which
coins I used on which toss.
Given only the outcomes,
x
, your job is to compute estimates for
θ
which is the set of all parameters,
θ
=
{
p,q,
and
π
}
. It pretty remarkable that this can be done at all.
To compute these estimates, we will create a
latent
variable
Z
where
z
i
∈ {
0
,
1
}
indicates the coin
used for the
n
th
toss. For example
z
2
= 1 indicates the ﬁrst coin was used on the second toss.
We deﬁne the
incomplete
data loglikelihood as log
P
(
x

θ
) and the
complete
data loglikelihood as
log
P
(
x,z

θ
).
1. (3 pts) The incomplete loglikelihood of the data is given by log
P
(
x

θ
) = log
(∑
z
P
(
x,z

θ
)
)
.
Use Jensen’s inequality to show that a lower bound on the incomplete loglikelihood is given
by:
log
P
(
x

θ
)
≥
X
z
g
(
z
)log
'
P
(
x,z

θ
)
g
(
z
)
“
where
g
(
z
) is an arbitrary probability distribution over the latent variable
Z
.
Solution:
log
P
(
x

θ
) = log
(
X
z
P
(
x,z

θ
)
)
= log
‡
X
z
g
(
z
)
P
(
x,z

θ
)
g
(
z
)
·
= log
‡
E
g
(
z
)
[
P
(
x,z

θ
)
g
(
z
)
]
·
≥
E
g
(
z
)
‡
log [
P
(
x,z

θ
)
g
(
z
)
]
·
=
‡
X
z
g
(
z
)log [
P
(
x,z

θ
)
g
(
z
)
]
·
1