This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Imbens, Lecture Notes 7, ARE213 Spring 06 1 ARE213 Econometrics Spring 2006 UC Berkeley Department of Agricultural and Resource Economics Maximum Likelihood Estimation II: Computational Issues (W 12.7) How do we compute the mle? A number of numerical methods exist for this type of problem. Here we discuss some theoretical but largely practical issues in implementing these methods. (For ease of comparison with later optimization problems and the material in the reader we reformulate this as minimizing minus the log likelihood functionthis obviously does not affect the substance of the problem.) One preliminary thing is that it is often useful to rescale the variables so they have approximately the same variance: having some variables that are magnitudes larger than other can lead to problems with machine precision. One leading method is NewtonRaphson . The idea is to approximate the objective function Q ( ) = L ( ) around some starting value by a quadratic function and find the exact minimum for that quadratic approximation. Call this value 1 . Redo the quadratic approximation around the minimum of the initial quadratic approximation and find the new minimum, call this 2 Do this repeatedly and the sequence of solutions 1 , 2 , . . . will converge to the minimum of the objective function. Formally, given a starting value , define iteratively k +1 = k 2 Q ( k ) 1 Q ( k ) . In the exponential case with hazard rate exp( x ) and probability density function f ( y  x ; ) = exp( x ) exp( exp( x )) the matrix of second derivatives is 2 Q ( ) = N i =1 y i x i x i exp( x i ) , which is positive definite if x i x i is positive definite. Hence the objective function is globally convex and if there is a solution to the first order conditions, it is the unique mle. In this Imbens, Lecture Notes 7, ARE213 Spring 06 2 case the NewtonRaphson algorithm works very well. Another class of algorithms does not require the calculation of the second derivatives. Most of these methods separate out the choice of direction and the choice of steplength . Let A k be any positive definite matrix, and consider iterations of the type k +1 = k k A k Q ( k ) ....
View
Full
Document
This note was uploaded on 08/01/2008 for the course ARE 213 taught by Professor Imbens during the Spring '06 term at University of California, Berkeley.
 Spring '06
 IMBENS

Click to edit the document details