Lec6 - Statistical Inference for FE Professor S Kou Department of IEOR Columbia University Lecture 6 Introduction to Statistical Computing 1

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Statistical Inference for FE Professor S. Kou, Department of IEOR, Columbia University Lecture 6. Introduction to Statistical Computing 1 Newton-Raphson’s Method to Compute the MLE One way to compute the MLE is the classical Newton-Raphson’s method, as we have discussed before. In general we know that the MLE θ solves an equation 0= l 0 ( θ ) , where l ( θ )=log L is the log-likelihood and l 0 ( θ ) is the f rst order derivative at θ . Taking the Taylor expansion around a point θ j leads to 0 l 0 ( θ j )+( θ θ j ) l 00 ( θ j ) , i.e. ( θ θ j ) ≈− l 0 ( θ j ) l 00 ( θ j ) , which leads to the Newton-Raphson’s iterative algorithm for f nding the MLE, θ j +1 = θ j l 0 ( θ j ) l 00 ( θ j ) . In the multiparameter case, the MLE of θ =( θ 1 ,..., θ k ) is a vector and the algorithm becomes θ j +1 = θ j H 1 ( θ j ) l 0 ( θ j ) , where l 0 ( θ j ) is the vector of f rst derivatives and H is the matrix of second derivatives of the log-likelihood. 2 EM Algorithm to Compute the MLE In general, computing the f rst and second derivatives may be hard, which are need for the implementation of the Newton-Raphson’s method. Alterna- tively, one can use the EM (expectation-maximization) algorithm, which is very easy to implement. The drawback of the EM algorithm is many times it is slower than the Newton-Raphson’s algorithm, if the latter algorithm can be implemented. So the essential trade-o f between the Newton-Raphson algorithm and the EM algorithm is speed versus simplicity in the implemen- tation. Of course, with the increasing computing power, the EM algorithm becomes quite popular. Thea lgor ithmassumestha twehaveada ta Y with likelihood L ( y ; θ ) , which is relatively di culty to maximize. However, when we use some other random variable Z , the likelihood L ( y,z ; θ ) can be easily maximized. Here is example why this may be the case. Example 1. (Mixture of Normals). Many times in f nance we have the distribution is a mixture of normal distributions. For example, this is thecasefortheMerton ’sjumpd i f usion. In this example we shall consider 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
the simplest possible case, in which we have a mixture of just two normal distributions. In other words, the density is given by f ( y ; θ )=(1 p ) φ ( y ; μ 0 , σ 0 )+ p φ ( y ; μ 1 , σ 1 ) , where φ ( y ; μ, σ ) denotes a normal density with mean μ and standard devia- tion σ . More precisely, with probability p , the data is from φ ( y ; μ 1 , σ 1 ) ;and with probability 1 p , the data is from φ ( y ; μ 0 , σ 0 ) . The likelihood is L ( y ; θ )= n Y i =1 { (1 p ) φ ( y i ; μ 0 , σ 0 p φ ( y i ; μ 1 , σ 1 ) } , which is hard to maximize. On the contrary, the “complete” likelihood, which include the unobserved latent Z i ,whe re Z i =0 represents the f rst normal and Z i =1 represents the second normal, is much easier to study and to maximize. Of course, in reality we do not observer Z i .
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/18/2010 for the course IEOR 4702 taught by Professor Kou during the Spring '10 term at Columbia.

Page1 / 10

Lec6 - Statistical Inference for FE Professor S Kou Department of IEOR Columbia University Lecture 6 Introduction to Statistical Computing 1

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online