This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GITGVU0220 February 2002 Abstract This note represents my attempt at explaining the EM algorithm (Hartley, 1958; Dempster et al., 1977; McLachlan and Krishnan, 1997). This is just a slight variation on Tom Minkas tutorial (Minka, 1998), perhaps a little easier (or perhaps not). It includes a graphical example to provide some intuition. 1 Intuitive Explanation of EM EM is an iterative optimizationmethod to estimate some unknownparameters , given measurement data U . However, we are not given some hidden nuisance variables J , which need to be integrated out. In particular, we want to maximize the posterior probability of the parameters given the data U , marginalizing over J : * = argmax X J J n P ( , J  U ) (1) The intuition behind EM is an old one: alternate between estimating the unknowns and the hidden variables J . This idea has been around for a long time. However, instead of finding the best J J given an estimate at each iteration, EM computes a distribution over the space J . One of the earliest papers on EM is (Hartley, 1958), but the seminal reference that formalized EM and provided a proof of convergence is the DLR paper by Dempster, Laird, and Rubin (Dempster et al., 1977). A recent book devoted entirely to EM and applications is (McLachlan and Krishnan, 1997), whereas (Tanner, 1996) is another popular and very useful reference. One of the most insightful explanations of EM, that provides a deeper understanding of its operation than the intuition of alternating between variables, is in terms of lower bound maximization (Neal and Hinton, 1998; Minka, 1998). In this derivation, the Estep can be interpreted as constructing a local lowerbound to the posterior distribu tion, whereas the Mstep optimizes the bound, thereby improving the estimate for the unknowns. This is demonstrated below for a simple example. 154321 1 2 3 4 50.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Figure 1: EM example: Mixture components and data. The data consists of three samples drawn from each mixture component, shown above as circles and triangles. The means of the mixture components are 2 and 2 , respectively.321 1 2 3321 1 2 3 0.1 0.2 0.3 0.4 0.5 1 2 Figure 2: The true likelihood function of the two component means 1 and 2 , given the data in Figure 1. 2321 1 2 3321 1 2 3 0.1 0.2 0.3 0.4 0.5 1 i=1, Q=3.279564 2 1 2 3 4 5 6 . 5 1 1 . 5 2 2 . 5321 1 2 3321 1 2 3 0.1 0.2 0.3 0.4 0.5 1 i=2, Q=2.788156 2321 1 2 3321 1 2 3 0.1 0.2 0.3 0.4 0.5 1 i=3, Q=1.501116 2 1 2 3 4 5 6 . 5 1 1 . 5 2 2 . 5 1...
View Full
Document
 Winter '08
 EDMONDS

Click to edit the document details