Stat841f09 - Wiki Course Notes

# 2 compute the mean for each cluster and make it as

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: e number of hidden units is equal to the number of clusters, which is , we set it equal for all clusters. The basic details for K- means clustering are given: The K initial centers are randomly chosen from the training data. Then the following two steps are iterated alternately until convergence. 1. For each existing center, we re- identify its cluster (every point in this cluster should be closer to this center than to others). 2. Compute the mean for each cluster and make it as the new center for each cluster. Kmeans Clus tering alg orithm For a given cluster assignment , the total cluster variance is minimized with respect to wikicour senote.com/w/index.php?title= Stat841&amp;pr intable= yes yielding the means of the currently assigned clusters. 57/74 10/09/2013 Given a current set of means Stat841 - Wiki Cour se Notes , cluster mean. That is, is minimized by assigning each observation to the closet (current) . Steps 1 and 2 are iterated until the assignments do not change. Example: Partition data into 2 clusters (2 hidden values) &gt; Xrn(08) &gt; =ad3,0; &gt; [D,,uDD=masX2; &gt; IXCsm,]ken(,) &gt; sz(D) &gt; ieIX &gt; &gt; 3 0 1 &gt; sz() &gt; ieC &gt; &gt; 2 8 0 &gt; sz(uD &gt; iesm) &gt; &gt; 2 1 &gt; c=u(D=1 &gt; 1smIX=) &gt; &gt; 1 4 &gt; c=u(D=2 &gt; 2smIX=) &gt; &gt; 1 6 &gt; sm &gt; uD &gt; &gt; 8.63 564 &gt; &gt; 1101 0.49 &gt; v=uD11/1 &gt; 1sm(,)c &gt; &gt; 618 .19 &gt; v=uD21/2 &gt; 2sm(,)c &gt; &gt; 635 .11 Comments: We create X randomly as a training set with 80 data points and 30 dimensions, and then apply “kmeans” method to separate X into 2 clusters. IDX is a vector contains 1 or 2 which indicates 2 clusters, and its size is 30*1. is the center (mean) of each cluster with size 2*80; sumD is sum of the square distance between the data points and center of its cluster. The and indicate the number of data points in cluster 1 and 2. is the variance of the first cluster ; is the variance of the second cluster . Now we can get , , hat matrix and by following equations. Finally...
View Full Document

## This document was uploaded on 03/07/2014.

Ask a homework question - tutors are online