Statistical Inference for FE
Professor S. Kou, Department of IEOR, Columbia University
Lecture 6. Introduction to Statistical Computing
1
NewtonRaphson’s Method to Compute the MLE
One way to compute the MLE is the classical NewtonRaphson’s method,
as we have discussed before. In general we know that the MLE
θ
solves an
equation
0 =
l
0
(
θ
)
,
where
l
(
θ
) = log
L
is the loglikelihood and
l
0
(
θ
)
is the
fi
rst order derivative
at
θ
. Taking the Taylor expansion around a point
θ
j
leads to
0
≈
l
0
(
θ
j
) + (
θ
−
θ
j
)
l
00
(
θ
j
)
,
i.e.
(
θ
−
θ
j
)
≈ −
l
0
(
θ
j
)
l
00
(
θ
j
)
,
which leads to the NewtonRaphson’s iterative algorithm for
fi
nding the
MLE,
θ
j
+1
=
θ
j
−
l
0
(
θ
j
)
l
00
(
θ
j
)
.
In the multiparameter case, the MLE of
θ
= (
θ
1
, ...,
θ
k
)
is a vector and the
algorithm becomes
θ
j
+1
=
θ
j
−
H
−
1
(
θ
j
)
l
0
(
θ
j
)
,
where
l
0
(
θ
j
)
is the vector of
fi
rst derivatives and
H
is the matrix of second
derivatives of the loglikelihood.
2
EM Algorithm to Compute the MLE
In general, computing the
fi
rst and second derivatives may be hard, which
are need for the implementation of the NewtonRaphson’s method. Alterna
tively, one can use the EM (expectationmaximization) algorithm, which is
very easy to implement. The drawback of the EM algorithm is many times it
is slower than the NewtonRaphson’s algorithm, if the latter algorithm can
be implemented.
So the essential tradeo
ff
between the NewtonRaphson
algorithm and the EM algorithm is speed versus simplicity in the implemen
tation. Of course, with the increasing computing power, the EM algorithm
becomes quite popular.
The algorithm assumes that we have a data
Y
with likelihood
L
(
y
;
θ
)
,
which is relatively di
ﬃ
culty to maximize. However, when we use some other
random variable
Z
, the likelihood
L
(
y, z
;
θ
)
can be easily maximized. Here
is example why this may be the case.
Example 1.
(Mixture of Normals).
Many times in
fi
nance we have
the distribution is a mixture of normal distributions. For example, this is
the case for the Merton’s jump di
ff
usion. In this example we shall consider
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
the simplest possible case, in which we have a mixture of just two normal
distributions. In other words, the density is given by
f
(
y
;
θ
) = (1
−
p
)
φ
(
y
;
μ
0
,
σ
0
) +
p
φ
(
y
;
μ
1
,
σ
1
)
,
where
φ
(
y
;
μ,
σ
)
denotes a normal density with mean
μ
and standard devia
tion
σ
. More precisely, with probability
p
, the data is from
φ
(
y
;
μ
1
,
σ
1
)
; and
with probability
1
−
p
, the data is from
φ
(
y
;
μ
0
,
σ
0
)
. The likelihood is
L
(
y
;
θ
) =
n
Y
i
=1
{
(1
−
p
)
φ
(
y
i
;
μ
0
,
σ
0
) +
p
φ
(
y
i
;
μ
1
,
σ
1
)
}
,
which is hard to maximize.
On the contrary, the “complete” likelihood,
which include the unobserved latent
Z
i
, where
Z
i
= 0
represents the
fi
rst
normal and
Z
i
= 1
represents the second normal, is much easier to study
and to maximize. Of course, in reality we do not observer
Z
i
.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 kou
 Probability theory, Yi, Gibbs sampler

Click to edit the document details