2.160 System Identification, Estimation, and Learning
Lecture Notes No. 2
0
May 3, 2006
15. Maximum Likelihood
15.1 Principle
Consider an unknown stochastic process
Unknown
Stochastic
Proces
Observed data
()
N
N
y
y
y
y
,
,
,
2
1
"
=
Assume that each observed datum
is generated by an assumed stochastic process
having a PDF:
i
y
λ
πλ
2
2
1
;
,
m
x
e
x
m
f
−
−
=
(
1
)
where
is mean,
m
is variance, and
is the random variable associated with
.
x
i
y
We know that mean
and variance
m
are determined by
∑
=
=
N
i
i
y
N
m
1
1
(
∑
=
−
=
N
i
i
m
y
N
1
2
1
)
(
2
)
Let us now obtain the same parameter values,
and
m
, based on a different
principle: Maximum Likelihood.
Assuming that the
observations
are stochastically independent,
consider the joint probability associated with the
N
observations:
N
N
y
y
y
,
,
,
2
1
"
∏
=
−
−
=
N
i
m
x
N
i
e
x
x
x
m
f
1
2
1
2
2
1
,
,
;
,
"
(3)
Now, once
have been observed (have taken specific values), what
parameter values,
and
N
y
y
y
,
,
,
2
1
"
m
, provide the highest probability in
( )
N
x
x
x
m
f
"
,
,
;
,
2
1
?
In other words, what values of
and
m
are most likely the case? This means that
we maximize the following functional with respect to
m
and
:
( )
N
m
y
y
y
m
f
Max
"
,
,
;
,
2
1
,
(
4
)
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentNote that
are known values.
Therefore,
N
y
y
y
,
,
,
2
1
"
( )
N
y
y
y
m
f
"
,
,
;
,
2
1
λ
is a
function of
and
m
only. Using our standard notation, this can be rewritten as
)
;
,
(
max
arg
ˆ
N
y
m
f
θ
=
(
5
)
where
is estimate of
m
and
ˆ
.
Maximizing
is equivalent to
maximizing log
,
)
;
,
(
N
y
m
f
)
(
f
[]
()
∑
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
−
−
=
=
πλ
2
2
1
log
max
arg
)
;
,
(
log
max
arg
ˆ
2
m
y
y
m
f
i
N
(
6
)
Taking derivatives and setting them to zero,
∑
=
=
N
i
i
y
N
m
1
1
0
1
log
1
=
−
=
∂
∑
=
m
y
m
f
i
N
i
(7)
(9)
∂
∑
=
−
=
N
i
i
m
y
N
1
2
1
(10)
0
2
log
log
1
2
1
2
1
=
−
−
=
The above results
∑
=
=
N
i
i
y
N
m
1
1
and
(
∑
=
−
=
N
i
i
m
y
N
1
2
1
)
provide a stochastic
process model that is most likely to generate the observed data
. And these agree
with (2)
N
y
This Maximum Likelihood Estimate (MLE) is formally stated as follows.
Maximum Likelihood Estimate
Consider a joint probability density function with parameter vector
as a
stochastic model of an unknown process:
( )
N
x
x
x
f
"
,
,
;
2
1
(
1
1
)
Given observed data
form a deterministic function of
N
y
y
y
,
,
,
2
1
"
, called the
likelihood function:
(
)(
N
y
y
y
f
L
"
,
,
;
2
1
)
=
(
1
2
)
Determine parameter vector
so that this likelihood function becomes maximum.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '06
 HarryAsada
 Variance, Maximum likelihood, Estimation theory, Likelihood function, Fisher information matrix, Fisher Information Matrix M

Click to edit the document details