{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# lec6 - Statistical Inference for FE Professor S Kou...

This preview shows pages 1–3. Sign up to view the full content.

Statistical Inference for FE Professor S. Kou, Department of IEOR, Columbia University Lecture 6. Introduction to Statistical Computing 1 Newton-Raphson’s Method to Compute the MLE One way to compute the MLE is the classical Newton-Raphson’s method, as we have discussed before. In general we know that the MLE θ solves an equation 0 = l 0 ( θ ) , where l ( θ ) = log L is the log-likelihood and l 0 ( θ ) is the fi rst order derivative at θ . Taking the Taylor expansion around a point θ j leads to 0 l 0 ( θ j ) + ( θ θ j ) l 00 ( θ j ) , i.e. ( θ θ j ) ≈ − l 0 ( θ j ) l 00 ( θ j ) , which leads to the Newton-Raphson’s iterative algorithm for fi nding the MLE, θ j +1 = θ j l 0 ( θ j ) l 00 ( θ j ) . In the multiparameter case, the MLE of θ = ( θ 1 , ..., θ k ) is a vector and the algorithm becomes θ j +1 = θ j H 1 ( θ j ) l 0 ( θ j ) , where l 0 ( θ j ) is the vector of fi rst derivatives and H is the matrix of second derivatives of the log-likelihood. 2 EM Algorithm to Compute the MLE In general, computing the fi rst and second derivatives may be hard, which are need for the implementation of the Newton-Raphson’s method. Alterna- tively, one can use the EM (expectation-maximization) algorithm, which is very easy to implement. The drawback of the EM algorithm is many times it is slower than the Newton-Raphson’s algorithm, if the latter algorithm can be implemented. So the essential trade-o ff between the Newton-Raphson algorithm and the EM algorithm is speed versus simplicity in the implemen- tation. Of course, with the increasing computing power, the EM algorithm becomes quite popular. The algorithm assumes that we have a data Y with likelihood L ( y ; θ ) , which is relatively di culty to maximize. However, when we use some other random variable Z , the likelihood L ( y, z ; θ ) can be easily maximized. Here is example why this may be the case. Example 1. (Mixture of Normals). Many times in fi nance we have the distribution is a mixture of normal distributions. For example, this is the case for the Merton’s jump di ff usion. In this example we shall consider 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
the simplest possible case, in which we have a mixture of just two normal distributions. In other words, the density is given by f ( y ; θ ) = (1 p ) φ ( y ; μ 0 , σ 0 ) + p φ ( y ; μ 1 , σ 1 ) , where φ ( y ; μ, σ ) denotes a normal density with mean μ and standard devia- tion σ . More precisely, with probability p , the data is from φ ( y ; μ 1 , σ 1 ) ; and with probability 1 p , the data is from φ ( y ; μ 0 , σ 0 ) . The likelihood is L ( y ; θ ) = n Y i =1 { (1 p ) φ ( y i ; μ 0 , σ 0 ) + p φ ( y i ; μ 1 , σ 1 ) } , which is hard to maximize. On the contrary, the “complete” likelihood, which include the unobserved latent Z i , where Z i = 0 represents the fi rst normal and Z i = 1 represents the second normal, is much easier to study and to maximize. Of course, in reality we do not observer Z i .
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}