Unformatted text preview: are very close to each other and are far away from other clusters.
The structure was very well identiﬁed by the cluster algorithm. For the range
from 0.5 to 0.7 the objects are clearly assigned to the appropriate clusters. A
larger level of noise exists in the data set if the silhouette coefﬁcient is within
the range of 0.25 to 0.5 whereby also here still clusters are identiﬁable. Many
objects could not be assigned clearly to one cluster in this case due to the cluster
algorithm. At values under 0.25 it is practically impossible to identify a cluster
structure and to calculate meaningful (from the view of application) cluster
centers. The cluster algorithm more or less "guessed" the clustering.
Silhouette Coefﬁcient The purity measure is based on the well-known precision measure for information retrieval (cf. Pantel & Lin (2002)). Each resulting
cluster P from a partitioning P of the overall document set D is treated as if it
were the result of a query. Each set L of documents of a partitioning L, which is
Comparative Measures 38 LDV-FORUM A Brief Survey of Text Mining
obtained by manual labe...
View Full Document