This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS229 Lecture notes Andrew Ng Mixtures of Gaussians and the EM algorithm In this set of notes, we discuss the EM (ExpectationMaximization) for den sity estimation. Suppose that we are given a training set { x (1) , . . . , x ( m ) } as usual. Since we are in the unsupervised learning setting, these points do not come with any labels. We wish to model the data by specifying a joint distribution p ( x ( i ) , z ( i ) ) = p ( x ( i )  z ( i ) ) p ( z ( i ) ). Here, z ( i ) ∼ Multinomial( φ ) (where φ j ≥ 0, ∑ k j =1 φ j = 1, and the parameter φ j gives p ( z ( i ) = j ),), and x ( i )  z ( i ) = j ∼ N ( μ j , Σ j ). We let k denote the number of values that the z ( i ) ’s can take on. Thus, our model posits that each x ( i ) was generated by randomly choosing z ( i ) from { 1 , . . . , k } , and then x ( i ) was drawn from one of k Gaussians depeneding on z ( i ) . This is called the mixture of Gaussians model. Also, note that the z ( i ) ’s are latent random variables, meaning that they’re hidden/unobserved. This is what will make our estimation problem difficult. The parameters of our model are thus φ , φ and Σ. To estimate them, we can write down the likelihood of our data: ℓ ( φ, μ, Σ) = m summationdisplay i =1 log p ( x ( i ) ; φ, μ, Σ) = m summationdisplay i =1 log k summationdisplay...
View
Full Document
 '09
 Machine Learning, Wj, Maximum likelihood, Estimation theory, parameters, Gaussian Discriminant Analysis

Click to edit the document details