{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# mix - Mixture Models Mixture Models Jia Li Department of...

This preview shows pages 1–10. Sign up to view the full content.

Mixture Models Mixture Models Jia Li Department of Statistics The Pennsylvania State University Email: [email protected] http://www.stat.psu.edu/ jiali Jia Li http://www.stat.psu.edu/ jiali

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Mixture Models Clustering by Mixture Models General background on clustering Example method: k-means Mixture model based clustering Model estimation Jia Li http://www.stat.psu.edu/ jiali
Mixture Models Clustering A basic tool in data mining/pattern recognition: Divide a set of data into groups. Samples in one cluster are close and clusters are far apart. Jia Li http://www.stat.psu.edu/ jiali

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Mixture Models Motivations: Discover classes of data in an unsupervised way (unsupervised learning). E cient representation of data: fast retrieval, data complexity reduction. Various engineering purposes: tightly linked with pattern recognition. Jia Li http://www.stat.psu.edu/ jiali
Mixture Models Approaches to Clustering Represent samples by feature vectors. Define a distance measure to assess the closeness between data. “Closeness” can be measured in many ways. Define distance based on various norms. For gene expression levels in a set of micro-array data, “closeness” between genes may be measured by the Euclidean distance between the gene profile vectors, or by correlation . Jia Li http://www.stat.psu.edu/ jiali

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Mixture Models Approaches: Define an objective function to assess the quality of clustering and optimize the objective function ( purely computational ). Clustering can be performed based merely on pair-wise distances. How each sample is represented does not come into the picture. Statistical model based clustering . Jia Li http://www.stat.psu.edu/ jiali
Mixture Models K-means Assume there are M clusters with centroids Z = { z 1 , z 2 , ..., z M } . Each training sample is assigned to one of the clusters. Denote the assignment function by η ( · ). Then η ( i ) = j means the i th training sample is assigned to the j th cluster. Goal: minimize the total mean squared error between the training samples and their representative cluster centroids, that is, the trace of the pooled within cluster covariance matrix . arg min Z , η N i =1 x i z η ( i ) 2 Jia Li http://www.stat.psu.edu/ jiali

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Mixture Models Denote the objective function by L ( Z , η ) = N i =1 x i z η ( i ) 2 . Intuition: training samples are tightly clustered around the centroids. Hence, the centroids serve as a compact representation for the training data. Jia Li http://www.stat.psu.edu/ jiali
Mixture Models Necessary Conditions If Z is fixed, the optimal assignment function η ( · ) should follow the nearest neighbor rule, that is, η ( i ) = arg min j { 1 , 2 ,..., M } x i z j . If η ( · ) is fixed, the cluster centroid z j should be the average of all the samples assigned to the j th cluster: z j = i : η ( i )= j x i N j .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}