The EM Algorithm

# The EM Algorithm - The Expectation Maximization Algorithm...

• Notes
• 7

This preview shows 1 out of 3 pages.

The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-02-20 February 2002 Abstract This note represents my attempt at explaining the EM algorithm (Hartley, 1958; Dempster et al., 1977; McLachlan and Krishnan, 1997). This is just a slight variation on Tom Minka’s tutorial (Minka, 1998), perhaps a little easier (or perhaps not). It includes a graphical example to provide some intuition. 1 Intuitive Explanation of EM EM is an iterative optimization method to estimate some unknown parameters Θ , given measurement data U . However, we are not given some “hidden” nuisance variables J , which need to be integrated out. In particular, we want to maximize the posterior probability of the parameters Θ given the data U , marginalizing over J : Θ * = argmax Θ X J ∈J n P ( Θ , J | U ) (1) The intuition behind EM is an old one: alternate between estimating the unknowns Θ and the hidden variables J . This idea has been around for a long time. However, instead of finding the best J ∈ J given an estimate Θ at each iteration, EM computes a distribution over the space J . One of the earliest papers on EM is (Hartley, 1958), but the seminal reference that formalized EM and provided a proof of convergence is the “DLR” paper by Dempster, Laird, and Rubin (Dempster et al., 1977). A recent book devoted entirely to EM and applications is (McLachlan and Krishnan, 1997), whereas (Tanner, 1996) is another popular and very useful reference. One of the most insightful explanations of EM, that provides a deeper understanding of its operation than the intuition of alternating between variables, is in terms of lower- bound maximization (Neal and Hinton, 1998; Minka, 1998). In this derivation, the E-step can be interpreted as constructing a local lower-bound to the posterior distribu- tion, whereas the M-step optimizes the bound, thereby improving the estimate for the unknowns. This is demonstrated below for a simple example. 1

Subscribe to view the full document.

-5 -4 -3 -2 -1 0 1 2 3 4 5 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Figure 1: EM example: Mixture components and data. The data consists of three samples drawn from each mixture component, shown above as circles and triangles. The means of the mixture components are - 2 and 2 , respectively. -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 θ 1 θ 2 Figure 2: The true likelihood function of the two component means θ 1 and θ 2 , given the data in Figure 1.
You've reached the end of this preview.
• Spring '08
• EDMONDS
• Maximum likelihood, Estimation theory, lower bound, em algorithm, J J

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern