lecture15-annotated - Machine Learning 10-701/15-781, Fall...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Eric Xing © Eric Xing @ CMU, 2006-2008 1 Machine Learning Machine Learning 10 10 -701/15 701/15 -781, Fall 2008 781, Fall 2008 Expectation Maximization Expectation Maximization Eric Xing Eric Xing Lecture 15, October 29, 2008 Reading: Chap. 9, C.B book Eric Xing © Eric Xing @ CMU, 2006-2008 2 Clustering
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Eric Xing © Eric Xing @ CMU, 2006-2008 3 Unobserved Variables z A variable can be unobserved (latent) because: z it is an imaginary quantity meant to provide some simplified and abstractive view of the data generation process z e.g., speech recognition models, mixture models … z it is a real-world object and/or phenomena, but difficult or impossible to measure z e.g., the temperature of a star, causes of a disease, evolutionary ancestors … z it is a real-world object and/or phenomena, but sometimes wasn’t measured, because of faulty sensors; or was measure with a noisy channel, etc. z e.g., traffic radio, aircraft signal on a radar screen, z Discrete latent variables can be used to partition/cluster data into sub-groups (mixture models, forthcoming). z Continuous latent variables (factors) can be used for dimensionality reduction (factor analysis, etc., later lectures). Eric Xing © Eric Xing @ CMU, 2006-2008 4 Uni-modal and multi-modal distributions
Background image of page 2
3 Eric Xing © Eric Xing @ CMU, 2006-2008 5 Mixture Models Eric Xing © Eric Xing @ CMU, 2006-2008 6 Mixture Models, con'd z A density model p ( x ) may be multi-modal. z We may be able to model it as a mixture of uni-modal distributions (e.g., Gaussians). z Each mode may correspond to a different sub-population (e.g., male and female).
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 Eric Xing © Eric Xing @ CMU, 2006-2008 7 Gaussian Mixture Models (GMMs) z Consider a mixture of K Gaussian components: z Z is a latent class indicator vector: z X is a conditional Gaussian variable with a class-specific mean/covariance z The likelihood of a sample: ( ) ) : ( multi ) ( k z n p π = = { } ) - ( ) - ( - exp ) ( ) , , | ( / / T m x µ 1 2 1 2 1 2 2 1 1 Σ Σ = Σ = () ∑ ∏ Σ = Σ = Σ = = = Σ N ) , | , ( ) , : ( ) , , | , ( ) | ( ) , ( 1 1 mixture proportion mixture component Eric Xing © Eric Xing @ CMU, 2006-2008 8 Gaussian Mixture Models (GMMs) z Consider a mixture of K Gaussian components: z This model can be used for unsupervised clustering. z This model (fit by AutoClass) has been used to discover new kinds of stars in astronomical data, etc. Σ = Σ k k k k n x N x p ) , | , ( ) , ( mixture proportion mixture component
Background image of page 4
5 Eric Xing © Eric Xing @ CMU, 2006-2008 9 Learning mixture models Eric Xing © Eric Xing @ CMU, 2006-2008 10 Why is Learning Harder? z In fully observed iid settings, the log likelihood decomposes into a sum of local terms. z With latent variables, all the parameters become coupled together via marginalization ) , | ( log ) | ( log ) | , ( log ) ; ( x z c z x p z p z x p D θ + = = l = = z x z z c z x p z p z x p D ) , | ( ) | ( log ) | , ( log ) ; (
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 Eric Xing © Eric Xing @ CMU, 2006-2008 11 Gradient Learning for mixture models z We can learn mixture densities using gradient descent on the log likelihood. The gradients are quite interesting: z In other words, the gradient is the responsibility weighted sum of the individual log likelihood gradients.
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/26/2010 for the course MACHINE LE 10701 taught by Professor Ericp.xing during the Fall '08 term at Carnegie Mellon.

Page1 / 19

lecture15-annotated - Machine Learning 10-701/15-781, Fall...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online