Unformatted text preview: square error for a cluster P is given by: MSE( P) = and µ P = 1 | P| ∑ dist(d, µP )2 , d∈ P (10) ∑d∈ P td is the centroid of the clusters P and dist is a distance measure. One clustering measure that is independent from the number of clusters is the silhouette coefficient SC(P) (cf. Kaufman & Rousseeuw (1990)). The main idea of the coefficient is to find out the location of a document in the space with respect to the cluster of the document and the next similar cluster. For a good clustering the considered document is nearby the own cluster whereas for a bad clustering the document is closer to the next cluster. With the help of the silhouette coefficient one is able to judge the quality of a cluster or the entire clustering (details can be found in Kaufman & Rousseeuw (1990)). Kaufman & Rousseeuw (1990) gives characteristic values of the silhouette coefficient for the evaluation of the cluster quality. A value for SC(P) between 0.7 and 1.0 signals excellent separation between the found clusters, i.e. the objects within a cluster...
