This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: EM Demystiﬁed: An ExpectationMaximization Tutorial Yihua Chen and Maya R. Gupta Department of Electrical Engineering University of Washington Seattle, WA 98195 { yhchen,gupta } @ee.washington.edu Electrical Electrical Engineering Engineering UW UW UWEE Technical Report Number UWEETR20100002 February 2010 Department of Electrical Engineering University of Washington Box 352500 Seattle, Washington 981952500 PHN: (206) 5432150 FAX: (206) 5433842 URL: http://www.ee.washington.edu EM Demystiﬁed: An ExpectationMaximization Tutorial Yihua Chen and Maya R. Gupta Department of Electrical Engineering University of Washington Seattle, WA 98195 { yhchen,gupta } @ee.washington.edu University of Washington, Dept. of EE, UWEETR20100002 February 2010 Abstract After a couple of disastrous experiments trying to teach EM, we carefully wrote this tutorial to give you an intuitive and mathematically rigorous understanding of EM and why it works. We explain the standard applications of EM to learning Gaussian mixture models (GMMs) and hidden Markov models (HMMs), and prepare you to apply EM to new problems. This tutorial assumes you have an advanced undergraduate understanding of probability and statistics. 1 Introduction Expectationmaximization (EM) is a method to ﬁnd the maximum likelihood estimator of a parameter θ of a probability distribution. Let’s start with an example. Say that the probability of the temperature outside your window for each of the 24 hours of a day x ∈ R 24 depends on the season θ ∈ { summer, fall, winter, spring } , and that you know the seasonal temperature distribution p ( x  θ ) . But say you can only measure the average temperature y = ¯ x for the day, and you’d like to guess what season θ it is (for example, is spring here yet?). The maximum likelihood estimate of θ maximizes p ( y  θ ) , but in some cases this may be hard to ﬁnd. That’s when EM is useful – it takes your observed data y , iteratively makes guesses about the complete data x , and iteratively ﬁnds the θ that maximizes p ( x  θ ) over θ . In this way, EM tries to ﬁnd the maximum likelihood estimate of θ given y . We’ll see in later sections that EM doesn’t actually promise to ﬁnd you the θ that maximizes p ( y  θ ) , but there are some theoretical guarantees, and it often does a good job in practice, though it may need a little help in the form of multiple random starts . First, we go over the steps of EM, breaking down the usual twostep description into a sixstep description. Table 1 summarizes the key notation. Then we present a number of examples, including Gaussian mixture model (GMM) and hidden Markov model (HMM), to show you how EM is applied. In Section 4 we walk you through the proof that the EM estimate never gets worse as it iterates. To understand EM more deeply, we show in Section 5 that EM is iteratively maximizing a tight lower bound to the true likelihood surface. In Section 6, we provide details and examples for how maximizing a tight lower bound to the true likelihood surface....
View
Full
Document
This note was uploaded on 02/07/2012 for the course CSCI 5512 taught by Professor Staff during the Spring '08 term at Minnesota.
 Spring '08
 Staff

Click to edit the document details