The EM Algorithm

The EM Algorithm - The Expectation Maximization Algorithm...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-02-20 February 2002 Abstract This note represents my attempt at explaining the EM algorithm (Hartley, 1958; Dempster et al., 1977; McLachlan and Krishnan, 1997). This is just a slight variation on Tom Minkas tutorial (Minka, 1998), perhaps a little easier (or perhaps not). It includes a graphical example to provide some intuition. 1 Intuitive Explanation of EM EM is an iterative optimizationmethod to estimate some unknownparameters , given measurement data U . However, we are not given some hidden nuisance variables J , which need to be integrated out. In particular, we want to maximize the posterior probability of the parameters given the data U , marginalizing over J : * = argmax X J J n P ( , J | U ) (1) The intuition behind EM is an old one: alternate between estimating the unknowns and the hidden variables J . This idea has been around for a long time. However, instead of finding the best J J given an estimate at each iteration, EM computes a distribution over the space J . One of the earliest papers on EM is (Hartley, 1958), but the seminal reference that formalized EM and provided a proof of convergence is the DLR paper by Dempster, Laird, and Rubin (Dempster et al., 1977). A recent book devoted entirely to EM and applications is (McLachlan and Krishnan, 1997), whereas (Tanner, 1996) is another popular and very useful reference. One of the most insightful explanations of EM, that provides a deeper understanding of its operation than the intuition of alternating between variables, is in terms of lower- bound maximization (Neal and Hinton, 1998; Minka, 1998). In this derivation, the E-step can be interpreted as constructing a local lower-bound to the posterior distribu- tion, whereas the M-step optimizes the bound, thereby improving the estimate for the unknowns. This is demonstrated below for a simple example. 1-5-4-3-2-1 1 2 3 4 5-0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Figure 1: EM example: Mixture components and data. The data consists of three samples drawn from each mixture component, shown above as circles and triangles. The means of the mixture components are- 2 and 2 , respectively.-3-2-1 1 2 3-3-2-1 1 2 3 0.1 0.2 0.3 0.4 0.5 1 2 Figure 2: The true likelihood function of the two component means 1 and 2 , given the data in Figure 1. 2-3-2-1 1 2 3-3-2-1 1 2 3 0.1 0.2 0.3 0.4 0.5 1 i=1, Q=-3.279564 2 1 2 3 4 5 6 . 5 1 1 . 5 2 2 . 5-3-2-1 1 2 3-3-2-1 1 2 3 0.1 0.2 0.3 0.4 0.5 1 i=2, Q=-2.788156 2-3-2-1 1 2 3-3-2-1 1 2 3 0.1 0.2 0.3 0.4 0.5 1 i=3, Q=-1.501116 2 1 2 3 4 5 6 . 5 1 1 . 5 2 2 . 5 1...
View Full Document

Page1 / 7

The EM Algorithm - The Expectation Maximization Algorithm...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online