Lec6 - Statistical Inference for FE Professor S Kou Department of IEOR Columbia University Lecture 6 Introduction to Statistical Computing 1

This preview shows pages 1–3. Sign up to view the full content.

Statistical Inference for FE Professor S. Kou, Department of IEOR, Columbia University Lecture 6. Introduction to Statistical Computing 1 Newton-Raphson’s Method to Compute the MLE One way to compute the MLE is the classical Newton-Raphson’s method, as we have discussed before. In general we know that the MLE θ solves an equation 0= l 0 ( θ ) , where l ( θ )=log L is the log-likelihood and l 0 ( θ ) is the f rst order derivative at θ . Taking the Taylor expansion around a point θ j leads to 0 l 0 ( θ j )+( θ θ j ) l 00 ( θ j ) , i.e. ( θ θ j ) ≈− l 0 ( θ j ) l 00 ( θ j ) , which leads to the Newton-Raphson’s iterative algorithm for f nding the MLE, θ j +1 = θ j l 0 ( θ j ) l 00 ( θ j ) . In the multiparameter case, the MLE of θ =( θ 1 ,..., θ k ) is a vector and the algorithm becomes θ j +1 = θ j H 1 ( θ j ) l 0 ( θ j ) , where l 0 ( θ j ) is the vector of f rst derivatives and H is the matrix of second derivatives of the log-likelihood. 2 EM Algorithm to Compute the MLE In general, computing the f rst and second derivatives may be hard, which are need for the implementation of the Newton-Raphson’s method. Alterna- tively, one can use the EM (expectation-maximization) algorithm, which is very easy to implement. The drawback of the EM algorithm is many times it is slower than the Newton-Raphson’s algorithm, if the latter algorithm can be implemented. So the essential trade-o f between the Newton-Raphson algorithm and the EM algorithm is speed versus simplicity in the implemen- tation. Of course, with the increasing computing power, the EM algorithm becomes quite popular. Thea lgor ithmassumestha twehaveada ta Y with likelihood L ( y ; θ ) , which is relatively di culty to maximize. However, when we use some other random variable Z , the likelihood L ( y,z ; θ ) can be easily maximized. Here is example why this may be the case. Example 1. (Mixture of Normals). Many times in f nance we have the distribution is a mixture of normal distributions. For example, this is thecasefortheMerton ’sjumpd i f usion. In this example we shall consider 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
the simplest possible case, in which we have a mixture of just two normal distributions. In other words, the density is given by f ( y ; θ )=(1 p ) φ ( y ; μ 0 , σ 0 )+ p φ ( y ; μ 1 , σ 1 ) , where φ ( y ; μ, σ ) denotes a normal density with mean μ and standard devia- tion σ . More precisely, with probability p , the data is from φ ( y ; μ 1 , σ 1 ) ;and with probability 1 p , the data is from φ ( y ; μ 0 , σ 0 ) . The likelihood is L ( y ; θ )= n Y i =1 { (1 p ) φ ( y i ; μ 0 , σ 0 p φ ( y i ; μ 1 , σ 1 ) } , which is hard to maximize. On the contrary, the “complete” likelihood, which include the unobserved latent Z i ,whe re Z i =0 represents the f rst normal and Z i =1 represents the second normal, is much easier to study and to maximize. Of course, in reality we do not observer Z i .
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/18/2010 for the course IEOR 4702 taught by Professor Kou during the Spring '10 term at Columbia.

Page1 / 10

Lec6 - Statistical Inference for FE Professor S Kou Department of IEOR Columbia University Lecture 6 Introduction to Statistical Computing 1

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online