lecture12-clustering-handout-6-per

# 171 hierarchical agglomerave clustering hac starts

Unformatted text preview: compressed summary of search results list.   Tradeoﬀ between having more clusters (beVer focus within each cluster) and having too many clusters Introduc)on to Informa)on Retrieval K not speciﬁed in advance Penalize lots of clusters   Given a clustering, deﬁne the Beneﬁt for a doc to be the cosine similarity to its centroid   Deﬁne the Total Beneﬁt to be the sum of the individual doc Beneﬁts.   For each cluster, we have a Cost C.   Thus for a clustering with K clusters, the Total Cost is KC.   Deﬁne the Value of a clustering to be = Why is there always a clustering of Total Benefit n? Total Beneﬁt  ­ Total Cost.   Find the clustering of highest value, over all choices of K.   Total beneﬁt increases with increasing K. But can stop when it doesn t increase by much . The Cost term enforces this. 5 Introduc)on to Informa)on Retrieval h. 17 C Introduc)on to Informa)on...
