Maximum Likelihood Estimation
Eric Zivot
May 14, 2001
This version: November 15, 2009
1
Maximum Likelihood Estimation
1.1
The Likelihood Function
Let
X
1
, . . . , X
n
be an iid sample with probability density function (pdf)
f
(
x
i
;
θ
)
,
where
θ
is a
(
k
×
1)
vector of parameters that characterize
f
(
x
i
;
θ
)
.
For example, if
X
i
˜
N
(
μ, σ
2
)
then
f
(
x
i
;
θ
) = (2
πσ
2
)
−
1
/
2
exp(
−
1
2
σ
2
(
x
i
−
μ
)
2
)
and
θ
= (
μ, σ
2
)
0
.
The
joint density
of the sample is, by independence, equal to the product of the marginal
densities
f
(
x
1
, . . . , x
n
;
θ
) =
f
(
x
1
;
θ
)
· · ·
f
(
x
n
;
θ
) =
n
Y
i
=1
f
(
x
i
;
θ
)
.
The joint density is an
n
dimensional function of the data
x
1
, . . . , x
n
given the para
meter vector
θ.
The joint density
1
satis
fi
es
f
(
x
1
, . . . , x
n
;
θ
)
≥
0
Z
· · ·
Z
f
(
x
1
, . . . , x
n
;
θ
)
dx
1
· · ·
dx
n
=
1
.
The likelihood function is de
fi
ned as the joint density treated as a functions of the
parameters
θ
:
L
(
θ

x
1
, . . . , x
n
) =
f
(
x
1
, . . . , x
n
;
θ
) =
n
Y
i
=1
f
(
x
i
;
θ
)
.
Notice that the likelihood function is a
k
dimensional function of
θ
given the data
x
1
, . . . , x
n
.
It is important to keep in mind that the likelihood function, being a
function of
θ
and not the data, is not a proper pdf. It is always positive but
Z
· · ·
Z
L
(
θ

x
1
, . . . , x
n
)
dθ
1
· · ·
dθ
k
6
= 1
.
1
If
X
1
, . . . , X
n
are discrete random variables, then
f
(
x
1
, . . . , x
n
;
θ
) = Pr(
X
1
=
x
1
, . . . , X
n
=
x
n
)
for a
fi
xed value of
θ.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
To simplify notation, let the vector
x
= (
x
1
, . . . , x
n
)
denote the observed sample.
Then the joint pdf and likelihood function may be expressed as
f
(
x
;
θ
)
and
L
(
θ

x
)
.
Example 1
Bernoulli Sampling
Let
X
i
˜
Bernoulli(
θ
)
.
That is,
X
i
= 1
with probability
θ
and
X
i
= 0
with proba
bility
1
−
θ
where
0
≤
θ
≤
1
.
The pdf for
X
i
is
f
(
x
i
;
θ
) =
θ
x
i
(1
−
θ
)
1
−
x
i
, x
i
= 0
,
1
Let
X
1
, . . . , X
n
be an iid sample with
X
i
˜
Bernoulli(
θ
)
.
The joint density/likelihood
function is given by
f
(
x
;
θ
) =
L
(
θ

x
) =
n
Y
i
=1
θ
x
i
(1
−
θ
)
1
−
x
i
=
θ
S
n
i
=1
x
i
(1
−
θ
)
n
−
S
n
i
=1
x
i
For a given value of
θ
and observed sample
x
,
f
(
x
;
θ
)
gives the probability of observing
the sample. For example, suppose
n
= 5
and
x
= (0
, . . . ,
0)
.
Now some values of
θ
are more likely to have generated this sample than others. In particular, it is more
likely that
θ
is close to zero than one. To see this, note that the likelihood function
for this sample is
L
(
θ

(0
, . . . ,
0)) = (1
−
θ
)
5
This function is illustrated in
fi
gure xxx. The likelihood function has a clear maximum
at
θ
= 0
.
That is,
θ
= 0
is the value of
θ
that makes the observed sample
x
= (0
, . . . ,
0)
most likely (highest probability)
Similarly, suppose
x
= (1
, . . . ,
1)
.
Then the likelihood function is
L
(
θ

(1
, . . . ,
1)) =
θ
5
which is illustrated in
fi
gure xxx.
Now the likelihood function has a maximum at
θ
= 1
.
Example 2
Normal Sampling
Let
X
1
, . . . , X
n
be an iid sample with
X
i
˜
N
(
μ, σ
2
)
.
The pdf for
X
i
is
f
(
x
i
;
θ
) = (2
πσ
2
)
−
1
/
2
exp
μ
−
1
2
σ
2
(
x
i
−
μ
)
2
¶
,
−∞
< μ <
∞
, σ
2
>
0
,
−∞
< x <
∞
so that
θ
= (
μ, σ
2
)
0
.
The likelihood function is given by
L
(
θ

x
)
=
n
Y
i
=1
(2
πσ
2
)
−
1
/
2
exp
μ
−
1
2
σ
2
(
x
i
−
μ
)
2
¶
=
(2
πσ
2
)
−
n/
2
exp
Ã
−
1
2
σ
2
n
X
i
=1
(
x
i
−
μ
)
2
!
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '09
 Zivot
 Normal Distribution, Maximum likelihood, Likelihood function, ˆmle

Click to edit the document details