taxonomies to classification of the species into subspecies. The subspecies can be further classified into subspecies. Clustering analysis is also widely used in information, policy and decision sciences. The various applications of clustering analysis to documents include votes on political issues, survey of markets, survey of products, survey of sales programs, and R & D.
A Clustering Example Income: High Children:1 Car:Luxury Income: Low Children:0 Car:Compact Car: Sedan and Children:3 Income: Medium Income: Medium Children:2 Car:Truck Cluster 1 Cluster 2 Cluster 3 Cluster 4
Different ways of representing clusters (b) a d k j h g i f e c b a d k j h g i f e c b (a) (c) 1 2 3 a b c 0.4 0.1 0.5 0.1 0.8 0.1 0.3 0.3 0.4 ... (d) g a c i e d k b j f h
K Means Clustering (Iterative distance-based clustering) K means clustering is an effective algorithm to extract a given number of clusters of patterns from a training set. Once done, the cluster locations can be used to classify patterns into distinct classes.
K means clustering (Cont.) Select the k cluster centers randomly. Store the k cluster centers. . cluster to belonging pattern th - the is and , cluster in patterns training of number the is mean, new the is where 1 : cluster the of mean the finding by center its recompute cluster, each For 1 k j X k N M X N M jk k k N j jk k k k . of member a as classify and center cluster nearest the find set, training in the patern each For set. training entire he Classify t C X C X i i Loop until the change in cluster means is less the amount specified by the user.
The drawbacks of K-means clustering The final clusters do not represent a global optimization result but only the local one, and complete different final clusters can arise from difference in the initial randomly chosen cluster centers. (fig. 1) We have to know how many clusters we will have at the first.
Drawback of K-means clustering (Cont.) Figure 1

