Also the runtime behavior with on2 is worse compared

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: in Duda & Hart (1973). By means of so-called dendrograms one can represent the hierarchy of the clusters obtained as a result of the repeated merging of clusters as described above. The dendrograms allows to estimate the number of clusters based on the distances of the merged clusters. Unfortunately, the selection of the appropriate linkage method depends on the desired cluster structure, which is usually unknown in advance. For example, single linkage tends to follow chain-like clusters in the data, while complete linkage tends to create ellipsoid clusters. Thus prior knowledge about the expected distribution and cluster form is usually necessary for the selection of the appropriate method (see also Duda & Hart (1973)). However, substantially more problematic for the use of the algorithm for large data sets is the memory required to store the similarity matrix, which consists of n(n − 1)/2 elements where n is the number of documents. Also the runtime behavior with O(n2 ) is worse compared to the linear behavior of KMeans as discussed in the following. k-means is one of the...
View Full Document

Ask a homework question - tutors are online