© 2011 - Steven Tschantz
Maximum-likelihood estimation
Steven Tschantz
3/22/11
Problem
A mathematical model of a real world situation will often be defined with one or more free parameters to be calibrated to
observed data. Rarely do we expect a model to fit the observed data exactly however. Measurements are subject to some
error, not all factors determining an outcome will be known or observable, and randomness is an essential part of real world
phenomena. A statistical model includes random factors in its specification. But observed samples will generally not deter-
mine parameters of a statistical model uniquely. The problem is how to estimate the parameters for a statistical model.
Method
While there are many methods for estimating the parameters of a statistical model, and many criteria for judging such methods,
this notebook will concentrate on a particular method applicable to many problems which is based on a simple heuristic. For
given parameters, a statistical model should predict the probability of a particular observation. While a single observation may
give a highly unusual result, one with a low probability, repeated trials should reflect the probabilities predicted by the model.
One set of values of parameters might thus be considered a better estimate of the true parameters if the model with those
parameter values predicts a higher probability for the actual observations. Maximum-likelihood estimation takes for parameter
estimates the values of the parameters in the model that make the observed data the most likely.
For details see wikipedia, here are the basics. Let
Θ
be a vector of parameters of a statistical model, and
x
be a vector of
observed quantities in the model. The model should assign a probability density
f
H
x
;
Θ
L
to the observation of
x
. Suppose we
make
n
observations
x
1
,
x
2
, .
..,
x
n
, assumed to be identically and independently distributed according to the probability density
function
f
H
x
;
Θ
L
for some "true" but unknown values of the parameters
Θ
. We define the likelihood of the parameters being
Θ
given the observations
x
1
,
x
2
, .
.. to be the probability of these observations,
L
H
Θ
;
x
1
,
x
2
, ...,
x
n
L
=
±
i
=
1
n
f
H
x
i
;
Θ
L
.
A maximum-likelihood estimator for the model is a function that determines, as a function of the observations
x
1
,
x
2
, .
.., a set
of parameters
Θ
that maximize this likelihood (often uniquely determined).
The
Θ
that maximizes the likelihood will maximize the average log-likelihood
{
H
Θ
;
x
1
,
x
2
, ...,
x
n
L
=
1
n
log
H
L
H
Θ
;
x
1
,
x
2
, ...,
x
n
LL
=
1
n
²
i
=
1
n
log
H
f
H
x
i
;
Θ
LL
and this will often be more convenient to work with. Adjusting
L
by a constant multiple, or adding a constant to
{
will not
change the values of
Θ
giving the maximum.