Imbens, Lecture Notes 7, ARE213 Spring ’06
1
ARE213
Econometrics
Spring 2006 UC Berkeley Department of Agricultural and Resource Economics
Maximum Likelihood Estimation II:
Computational Issues (W 12.7)
How do we compute the mle?
A number of numerical methods exist for this type of
problem. Here we discuss some theoretical but largely practical issues in implementing these
methods. (For ease of comparison with later optimization problems and the material in the
reader we reformulate this as minimizing minus the log likelihood function—this obviously
does not affect the substance of the problem.) One preliminary thing is that it is often useful
to rescale the variables so they have approximately the same variance: having some variables
that are magnitudes larger than other can lead to problems with machine precision.
One leading method is Newton–Raphson
.
The idea is to approximate the objective
function
Q
(
β
) =

L
(
β
) around some starting value
β
0
by a quadratic function and find the
exact minimum for that quadratic approximation. Call this value
β
1
. Redo the quadratic
approximation around the minimum of the initial quadratic approximation and find the
new minimum, call this
β
2
Do this repeatedly and the sequence of solutions
β
1
, β
2
, . . .
will
converge to the minimum of the objective function.
Formally, given a starting value
β
0
,
define iteratively
β
k
+1
=
β
k

∂
2
Q
∂β∂β
(
β
k
)

1
∂Q
∂β
(
β
k
)
.
In the exponential case with hazard rate exp(
x β
) and probability density function
f
(
y

x
;
β
) =
exp(
x β
) exp(

exp(
x β
)) the matrix of second derivatives is
∂
2
Q
∂β∂β
(
β
) =
N
i
=1
y
i
x
i
x
i
·
exp(
x
i
β
)
,
which is positive definite if
∑
x
i
x
i
is positive definite. Hence the objective function is globally
convex and if there is a solution to the first order conditions, it is the unique mle. In this
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Imbens, Lecture Notes 7, ARE213 Spring ’06
2
case the Newton–Raphson algorithm works very well.
Another class of algorithms does not require the calculation of the second derivatives.
Most of these methods separate out the choice of direction
and the choice of steplength
. Let
A
k
be any positive definite matrix, and consider iterations of the type
β
k
+1
=
β
k

λ
k
·
A
k
·
∂Q
∂β
(
β
k
)
.
The choice of steplength
λ
k
= 1 and matrix
A
k
=
∂
2
Q
∂β∂β
(
β
k
)

1
corresponds to Newton–
Raphson.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '06
 IMBENS
 Derivative, Optimization, Maximum likelihood, Likelihood function, objective function

Click to edit the document details