This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: A Note on the ExpectationMaximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at UrbanaChampaign November 2, 2004 1 Introduction The ExpectationMaximization (EM) algorithm is a general algorithm for maximumlikelihood estimation where the data are “incomplete” or the likelihood function involves latent variables. Note that the notion of “incomplete data” and “latent variables” are related: when we have a latent variable, we may regard our data as being incomplete since we do not observe values of the latent variables; similarly, when our data are incomplete, we often can also associate some latent variable with the missing data. For language modeling, the EM algorithm is often used to estimate parameters of a mixture model, in which the exact component model from which a data point is generated is hidden from us. Informally, the EM algorithm starts with randomly assigning values to all the parameters to be estimated. It then iterately alternates between two steps, called the expectation step (i.e., the “Estep”) and the max imization step (i.e., the “Mstep”), respectively. In the Estep, it computes the expected likelihood for the complete data (the socalled Qfunction) where the expectation is taken w.r.t. the computed conditional dis tribution of the latent variables (i.e., the “hidden variables”) given the current settings of parameters and our observed (incomplete) data. In the Mstep, it reestimates all the parameters by maximizing the Qfunction. Once we have a new generation of parameter values, we can repeat the Estep and another Mstep. This process continues until the likelihood converges, i.e., reaching a local maxima. Intuitively, what EM does is to iteratively “augment” the data by “guessing” the values of the hidden variables and to reestimate the parameters by assuming that the guessed values are the true values. The EM algorithm is a hillclimbing approach, thus it can only be guanranteed to reach a local maxima. When there are multiple maximas, whether we will actually reach the global maxima clearly depends on where we start; if we start at the “right hill”, we will be able to find a global maxima. When there are multiple local maximas, it is often hard to identify the “right hill”. There are two commonly used strategies to solving this problem. The first is that we try many different initial values and choose the solution that has the highest converged likelihood value. The second uses a much simpler model (ideally one with a unique global maxima) to determine an initial value for more complex models. The idea is that a simpler model can hopefully help locate a rough region where the global optima exists, and we start from a value in that region to search for a more accurate optima using a more complex model....
View
Full Document
 Winter '08
 EDMONDS
 Maximum likelihood, Estimation theory, Expectationmaximization algorithm, em algorithm

Click to edit the document details