1
Maximum-likelihood
Recall the defnition oF the maximum-likelihood estimation problem. We have a density Function
that is governed by the set oF parameters
(e.g.,
might be a set oF Gaussians and
could
be the means and covariances). We also have a data set oF size
, supposedly drawn From this
distribution, i.e.,
. That is, we assume that these data vectors are independent and
identically distributed (i.i.d.) with distribution
. ThereFore, the resulting density For the samples is
This Function
is called the likelihood oF the parameters given the data, or just the likelihood
Function. The likelihood is thought oF as a Function oF the parameters
where the data
is fxed.
In the maximum likelihood problem, our goal is to fnd the
that maximizes
. That is, we wish
to fnd
where
argmax
OFten we maximize
instead because it is analytically easier.
Depending on the Form oF
this problem can be easy or hard. ±or example, iF
is simply a single Gaussian distribution where
, then we can set the derivative oF
to zero, and solve directly For
and
(this, in Fact, results in the standard Formulas
For the mean and variance oF a data set). ±or many problems, however, it is not possible to fnd such
analytical expressions, and we must resort to more elaborate techniques.
2
Basic EM
The EM algorithm is one such elaborate technique. The EM algorithm [ALR77, RW84, GJ95, JJ94,
Bis95, Wu83] is a general method oF fnding the maximum-likelihood estimate oF the parameters oF
an underlying distribution From a given data set when the data is incomplete or has missing values.
There are two main applications oF the EM algorithm. The frst occurs when the data indeed
has missing values, due to problems with or limitations oF the observation process. The second
occurs when optimizing the likelihood Function is analytically intractable but when the likelihood
Function can be simplifed by assuming the existence oF and values For additional but