This preview shows page 1. Sign up to view the full content.
Unformatted text preview: f the k centroids (also called cluster prototype). Step
four calculates a new centroids on the basis of the new allocations. We repeat
the two steps in a loop (step ﬁve) until the cluster centroids do not change any
more. The algorithm 5.1 corresponds to a simple hill climbing procedure which
typically gets stuck in a local optimum (the ﬁnding of the global optimum
is a NP complete problem). Apart from a suitable method to determine the
starting solution (step one), we require a measure for calculating the distance or Band 20 – 2005 41 Hotho, Nürnberger, and Paaß
Algorithm 1 The KMeans algorithm
Input: set D, distance measure dist, number k of cluster
Output: A partitioning P of the set D of documents (i. e., a set P of k disjoint
subsets of D with P∈P P = D).
1: Choose randomly k data points from D as starting centroids t P1 . . .t Pk .
Assign each point of P to the closest centroid with respect to dist.
(Re-)calculate the cluster centroids t P1 . . . t Pk of clusters P1 . . . Pk .
5: until cluster...
View Full Document
- Summer '11