This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CS229 Lecture notes Andrew Ng Part X Factor analysis When we have data x ( i ) R n that comes from a mixture of several Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually imagine problems were the we have sufficient data to be able to discern the multiple-Gaussian structure in the data. For instance, this would be the case if our training set size m was significantly larger than the dimension n of the data. Now, consider a setting in which n m . In such a problem, it might be difficult to model the data even with a single Gaussian, much less a mixture of Gaussian. Specifically, since the m data points span only a low-dimensional subspace of R n , if we model the data as Gaussian, and estimate the mean and covariance using the usual maximum likelihood estimators, = 1 m m X i =1 x ( i ) = 1 m m X i =1 ( x ( i )- )( x ( i )- ) T , we would find that the matrix is singular. This means that - 1 does not exist, and 1 / | | 1 / 2 = 1 / 0. But both of these terms are needed in computing the usual density of a multivariate Gaussian distribution. Another way of stating this difficulty is that maximum likelihood estimates of the parameters result in a Gaussian that places all of its probability in the affine space spanned by the data, 1 and this corresponds to a singular covariance matrix. 1 This is the set of points x satisfying x = m i =1 i x ( i ) , for some i s so that m i =1 1 = 1. 1 2 More generally, unless m exceeds n by some reasonable amount, the max- imum likelihood estimates of the mean and covariance may be quite poor. Nonetheless, we would still like to be able to fit a reasonable Gaussian model to the data, and perhaps capture some interesting covariance structure in the data. How can we do this? In the next section, we begin by reviewing two possible restrictions on , ones that allow us to fit with small amounts of data but neither of which will give a satisfactory solution to our problem. We next discuss some properties of Gaussians that will be needed later; specifically, how to find marginal and conditonal distributions of Gaussians. Finally, we present the factor analysis model, and EM for it. 1 Restrictions of If we do not have sufficient data to fit a full covariance matrix, we may place some restrictions on the space of matrices that we will consider. For instance, we may choose to fit a covariance matrix that is diagonal. In this setting, the reader may easily verify that the maximum likelihood estimate of the covariance matrix is given by the diagonal matrix satisfying jj = 1 m m X i =1 ( x ( i ) j- j ) 2 . Thus, jj is just the empirical estimate of the variance of the j-th coordinate of the data....
View Full Document