Unformatted text preview: square error for a cluster P is given by:
MSE( P) =
and µ P = 1
| P| ∑ dist(d, µP )2 , d∈ P (10) ∑d∈ P td is the centroid of the clusters P and dist is a distance measure. One clustering measure that is independent from the
number of clusters is the silhouette coefﬁcient SC(P) (cf. Kaufman & Rousseeuw
(1990)). The main idea of the coefﬁcient is to ﬁnd out the location of a document
in the space with respect to the cluster of the document and the next similar
cluster. For a good clustering the considered document is nearby the own cluster
whereas for a bad clustering the document is closer to the next cluster. With the
help of the silhouette coefﬁcient one is able to judge the quality of a cluster or
the entire clustering (details can be found in Kaufman & Rousseeuw (1990)).
Kaufman & Rousseeuw (1990) gives characteristic values of the silhouette coefﬁcient for the evaluation of the cluster quality. A value for SC(P) between 0.7
and 1.0 signals excellent separation between the found clusters, i.e. the objects
within a cluster...
View Full Document