This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 29.2 Clustering Algorithms 541 Algorithm 29.1 : C new ←− kMeansModify k ( C ) Input : [implicit] k : the number of clusters wanted, k ≤  A  Input : [implicit] dist: the distance measure between clusters to be used Input : [implicit] dist 2 : the distance measure between elements to be used Input : C : the list of clusters c to be modified Data : m : the index of the cluster C [ m ] with the lowest error Data : n : the index of the cluster C [ n ] nearest to C [ m ] Data : s : index of the cluster C [ s ] with the highest error Output : C new : the modified list of closters begin 1 m ←− m : error c ( C [ m ] ) = min { C [ i ] ∀ i ∈ [0 ,k − 1] } 2 n ←− n : dist( C [ m ] ,C [ n ] ) = min { dist( C [ m ] ,C [ i ] ) ∀ i ∈ [0 ,k − 1] \ { m }} 3 s ←− s : error c ( C [ s ] ) = max { error c ( C [ i ] ) ∀ i ∈ [0 ,k − 1] \ { m,n }} 4 C [ m ] ←− C [ m ] ∪ C [ n ] 5 a ←− a ∈ C [ s ] : dist 2 ( a, centroid( C [ s ] )) ≥ dist 2 ( b, centroid( C [ s ] )) ∀ b ∈ C [ s ] 6 C [ n ] ←− { a } 7 C [ s ] ←− C [ s ] \ { a } 8 return B 9 end 10 29.2.3 n th Nearest Neighbor Clustering The n th nearest neighbor clustering algorithm is defined in the context of this book only. It creates at most k clusters where the first k − 1 clusters contain exactly one element. The remaining elements are all together included in the last cluster. The elements of the single element clusters are those which have the longest distance to their n thnearest neighbor. This clustering algorithm is suitable for reducing a large set to a smaller one which contains still the most interesting elements (those in the singleelement clusters). It has relatively low complexity and thus runs fast, but on the other hand has the setback that dense aggregations of ≥ n elements will be put into the “rest elements”cluster. For n , normally a value of n = √ k is used. n th nearest neighbor clustering uses the k th nearest neighbor distance function dist ρ nn,k introduced in Definition 28.63 on page 506 with its parameter k set to n . Do not mix this parameter up with the parameter k of this clustering method – although they have the same name, they are not the same. I know, I know, this is not pretty. Notice that Algorithm 29.3 should only be applied if all the elements a ∈ A are unique, i.e., there exists no two equal elements in A ) which is, per definition, true for all sets. In a real implementation, a preprocessing step should remove are duplicates from A before clustering is performed. Especially our homemade nearest neighbor clustering variant is unsuitable to process lists containing the same elements multiple times. Since all equal elements have the same distance to their n th neighbor, it is likely that the result of the clustering is very unsatisfying since one element may occur multiple times whereas a variety of different other elements is ignored. Therefore, the aforementioned preprocessing should be applied, whichelements is ignored....
View
Full
Document
This document was uploaded on 08/10/2011.
 Spring '11
 Algorithms

Click to edit the document details