ISyE8843A, Brani Vidakovic
Handout 12
1
EM Algorithm and Mixtures.
1.1
Introduction
The ExpectationMaximization (EM) iterative algorithm is a broadly applicable statistical technique for
maximizing complex likelihoods and handling the incomplete data problem. At each iteration step of the
algorithm, two steps are performed: (i) EStep consisting of projecting an appropriate functional containing
the augmented data on the space of the original, incomplete data, and (ii) MStep consisting of maximizing
the functional. The name EM algorithm was coined by Dempster, Laird, and Rubin in their fundamental
paper [1], often referred to as DLR paper. But if one comes up with smart idea, one may be sure that other
smart guys in history thought about it.
The EM algorithm relates to MCMC as a forerunner by its data augmentation step that replaces sim
ulation by maximization.
Newcomb [7] was interested in estimating the mixtures of normals in 1886.
McKendrick [5] and Healy and Westmacott [3] proposed iterative methods that, in fact, are examples of the
EM algorithm. Dozens of papers proposing various applications of EM appeared before the DLR paper in
1997. However, the DLR paper was the first to unify and organize the approach.
1.2
What is EM?
Let
Y
be a random vector corresponding to the observed data
y
and having a postulated pdf as
f
(
y, ψ
)
,
where
ψ
= (
ψ
1
, . . . , ψ
d
)
is a vector of unknown parameters. Let
x
be a vector of augmented (so called
complete) data, and let
z
be the additional data,
x
= [
y, z
]
.
Denote by
g
c
(
x, ψ
)
the pdf of the random vector corresponding to the complete data set
x
. The log
likelihood for
ψ
, if
x
were fully observed, would be
log
L
c
(
ψ
) = log
g
c
(
x, ψ
)
.
The incomplete data vector
y
comes from the “incomplete” sample space
Y
. There is a 11 correspon
dence between the complete sample space
X
and the incomplete sample space
Y
.
Thus, for
x
∈ X
, one
can uniquely find the “incomplete”
y
=
y
(
x
)
∈ Y
.
Also, the incomplete pdf could be found by properly
integrating out the complete pdf,
g
(
y, ψ
) =
Z
X
(
y
)
g
c
(
x, ψ
)
dx,
where
X
(
y
)
is the subset of
X
constrained by the relation
y
=
y
(
x
)
.
Let
ψ
(0)
be some initial value for
ψ
. At the
k
th step the
EM
algorithm one performs the following two
steps:
EStep.
Calculate
Q
(
ψ, ψ
(
k
)
) =
E
ψ
(
k
)
{
log
L
c
(
ψ
)

y
}
.
MStep.
Choose any value
ψ
(
k
+1)
that maximizes
Q
(
ψ, ψ
(
k
)
)
,
i.e.,
(
∀
ψ
)
Q
(
ψ
(
k
+1)
, ψ
(
k
)
)
≥
Q
(
ψ, ψ
(
k
)
)
.
1