This preview shows pages 1–9. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learn ing: Gaussian Mixtures, KMeans, ISOMAP, HLLE, Laplacian Eigenmaps. Unsupervised learning Only u i.i.d. samples drawn on X from the unknown marginal distribution p ( x ) { x 1 , x 2 , . . . , x u } . The goal is to infer properties of this probability density. In lowdimension many nonparametric methods allow di rect estimation of p ( x ) itself. Owing to the curse of di mensionality , this methods fail in high dimension. One must settle for estimation of crude global models . Unsupervised learning (cont.) Different types of simple descriptive statistics that characterize aspects of p ( x ) • mixture modelling representation of p ( x ) by a mixture of simple densities representing different types or classes of observations [eg. Gaussian mixtures] • combinatorial clustering attempt to find multiple regions of X that contain modes of X [eg. KMeans] • dimensionality reduction attempt to identify lowdimensional manifolds in X that represent high data density [eg. ISOMAP,HLLE, Laplacian Eigenmaps] • manifold learning attempt to determine very specific geometrical or topological in variants of p ( x ) [eg. Homology learning] Limited formalization With supervised and semisupervised learning there is a clear measure of effectiveness of different methods. The expected loss of various estimators I [ f S ] can be estimated on validation set . In the context of unsupervised learning, it is diﬃcult to find such a direct measure of success . This situation has led to proliferation of proposed meth ods . Mixture Modelling Assumption that data is i.i.d. sampled from some proba bility distribution p ( x ). p ( x ) is modelled as a mixture of component density func tions, each component corresponding to a cluster or mode . The free parameters of the model are fit to the data by maximum likelihood . Gaussian Mixtures We first choose a parametric model P θ for the unknown density p ( x ), hence maximize the likelihood of our data relative to the parameters θ . Example: twocomponent gaussian mixture model with pa rameters θ = ( π, µ 1 , Σ 1 , µ 2 , Σ 2 ) . The model: P θ ( x ) = (1 − π ) G Σ 1 ( x − µ 1 ) + πG Σ 2 ( x − µ 2 ) Maximize the loglikelihood u ( θ { x 1 , . . . , x u } ) = log P θ ( x i )  i =1 The EM algorithm Maximization of ( θ { x 1 , . . . , x u } ) is a diﬃcult problem. Iterative max imization strategies, as the EM algorithm, can be used in practice to get local maxima ....
View
Full
Document
This note was uploaded on 11/11/2011 for the course BIO 9.07 taught by Professor Ruthrosenholtz during the Spring '04 term at MIT.
 Spring '04
 RuthRosenholtz

Click to edit the document details