The new centroids will be computed from the data points in the cluster until finally the centroids stabilize in their locations. These are the three clusters computed by this algorithm (Figure 8.9). The three clusters shown are a 3-datapoints cluster with centroid (6.5,4.5), a 2-datapoint cluster with centroid (4.5,3), and a 5- datapoint cluster with centroid (3.5,3). These cluster definitions are different from the ones derived visually. This is a function of the random starting centroid values. The centroid points used earlier in the visual exercise were different from that chosen with the K-means clustering algorithm. The K-means clustering exercise should, therefore, be run again with this data, but with new random centroid starting values. With many runs, the cluster definitions are likely to stabilize. If the cluster definitions do not stabilize, that may be a sign that the number of clusters chosen is too high or too low. The algorithm should also be run with different values of K.
Figure 8.8 Assigning data points to Recomputed centroids Figure 8.9 Recomputing centroids for each cluster till clusters stabilize Here is the pseudocode for implementing a K-means algorithm. Algorithm K-Means (K number of clusters, D list of data points) 1. Choose K number of random data points as initial centroids (cluster centers).2. Repeat till cluster centers stabilize:a. Allocate each point in D to the nearest of K centroids.b. Compute centroid for the cluster using all points in the cluster.Selecting the Number of Clusters
You've reached the end of your free preview.
Want to read all 25 pages?