Unformatted text preview: in Duda & Hart (1973).
By means of so-called dendrograms one can represent the hierarchy of the
clusters obtained as a result of the repeated merging of clusters as described
above. The dendrograms allows to estimate the number of clusters based
on the distances of the merged clusters. Unfortunately, the selection of the
appropriate linkage method depends on the desired cluster structure, which
is usually unknown in advance. For example, single linkage tends to follow
chain-like clusters in the data, while complete linkage tends to create ellipsoid
clusters. Thus prior knowledge about the expected distribution and cluster
form is usually necessary for the selection of the appropriate method (see
also Duda & Hart (1973)). However, substantially more problematic for the
use of the algorithm for large data sets is the memory required to store the
similarity matrix, which consists of n(n − 1)/2 elements where n is the number
of documents. Also the runtime behavior with O(n2 ) is worse compared to the
linear behavior of KMeans as discussed in the following.
k-means is one of the...
View Full Document