This preview shows pages 1–3. Sign up to view the full content.
Statistical Inference for FE
Professor S. Kou, Department of IEOR, Columbia University
Lecture 6. Introduction to Statistical Computing
1
NewtonRaphson’s Method to Compute the MLE
One way to compute the MLE is the classical NewtonRaphson’s method,
as we have discussed before. In general we know that the MLE
θ
solves an
equation
0=
l
0
(
θ
)
,
where
l
(
θ
)=log
L
is the loglikelihood and
l
0
(
θ
)
is the
f
rst order derivative
at
θ
. Taking the Taylor expansion around a point
θ
j
leads to
0
≈
l
0
(
θ
j
)+(
θ
−
θ
j
)
l
00
(
θ
j
)
,
i.e.
(
θ
−
θ
j
)
≈−
l
0
(
θ
j
)
l
00
(
θ
j
)
,
which leads to the NewtonRaphson’s iterative algorithm for
f
nding the
MLE,
θ
j
+1
=
θ
j
−
l
0
(
θ
j
)
l
00
(
θ
j
)
.
In the multiparameter case, the MLE of
θ
=(
θ
1
,...,
θ
k
)
is a vector and the
algorithm becomes
θ
j
+1
=
θ
j
−
H
−
1
(
θ
j
)
l
0
(
θ
j
)
,
where
l
0
(
θ
j
)
is the vector of
f
rst derivatives and
H
is the matrix of second
derivatives of the loglikelihood.
2
EM Algorithm to Compute the MLE
In general, computing the
f
rst and second derivatives may be hard, which
are need for the implementation of the NewtonRaphson’s method. Alterna
tively, one can use the EM (expectationmaximization) algorithm, which is
very easy to implement. The drawback of the EM algorithm is many times it
is slower than the NewtonRaphson’s algorithm, if the latter algorithm can
be implemented. So the essential tradeo
f
between the NewtonRaphson
algorithm and the EM algorithm is speed versus simplicity in the implemen
tation. Of course, with the increasing computing power, the EM algorithm
becomes quite popular.
Thea
lgor
ithmassumestha
twehaveada
ta
Y
with likelihood
L
(
y
;
θ
)
,
which is relatively di
ﬃ
culty to maximize. However, when we use some other
random variable
Z
, the likelihood
L
(
y,z
;
θ
)
can be easily maximized. Here
is example why this may be the case.
Example 1.
(Mixture of Normals). Many times in
f
nance we have
the distribution is a mixture of normal distributions. For example, this is
thecasefortheMerton
’sjumpd
i
f
usion. In this example we shall consider
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document the simplest possible case, in which we have a mixture of just two normal
distributions. In other words, the density is given by
f
(
y
;
θ
)=(1
−
p
)
φ
(
y
;
μ
0
,
σ
0
)+
p
φ
(
y
;
μ
1
,
σ
1
)
,
where
φ
(
y
;
μ,
σ
)
denotes a normal density with mean
μ
and standard devia
tion
σ
. More precisely, with probability
p
, the data is from
φ
(
y
;
μ
1
,
σ
1
)
;and
with probability
1
−
p
, the data is from
φ
(
y
;
μ
0
,
σ
0
)
. The likelihood is
L
(
y
;
θ
)=
n
Y
i
=1
{
(1
−
p
)
φ
(
y
i
;
μ
0
,
σ
0
p
φ
(
y
i
;
μ
1
,
σ
1
)
}
,
which is hard to maximize. On the contrary, the “complete” likelihood,
which include the unobserved latent
Z
i
,whe
re
Z
i
=0
represents the
f
rst
normal and
Z
i
=1
represents the second normal, is much easier to study
and to maximize. Of course, in reality we do not observer
Z
i
.
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 10/18/2010 for the course IEOR 4702 taught by Professor Kou during the Spring '10 term at Columbia.
 Spring '10
 kou

Click to edit the document details