t pk of clusters p1 pk 5 until cluster centroids

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: f the k centroids (also called cluster prototype). Step four calculates a new centroids on the basis of the new allocations. We repeat the two steps in a loop (step five) until the cluster centroids do not change any more. The algorithm 5.1 corresponds to a simple hill climbing procedure which typically gets stuck in a local optimum (the finding of the global optimum is a NP complete problem). Apart from a suitable method to determine the starting solution (step one), we require a measure for calculating the distance or Band 20 – 2005 41 Hotho, Nürnberger, and Paaß Algorithm 1 The KMeans algorithm Input: set D, distance measure dist, number k of cluster Output: A partitioning P of the set D of documents (i. e., a set P of k disjoint subsets of D with P∈P P = D). 1: Choose randomly k data points from D as starting centroids t P1 . . .t Pk . 2: repeat 3: Assign each point of P to the closest centroid with respect to dist. 4: (Re-)calculate the cluster centroids t P1 . . . t Pk of clusters P1 . . . Pk . 5: until cluster...
View Full Document

Ask a homework question - tutors are online