171 hierarchical agglomerave clustering hac starts

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: compressed summary of search results list.   Tradeoff between having more clusters (beVer focus within each cluster) and having too many clusters Introduc)on to Informa)on Retrieval K not specified in advance Penalize lots of clusters   Given a clustering, define the Benefit for a doc to be the cosine similarity to its centroid   Define the Total Benefit to be the sum of the individual doc Benefits.   For each cluster, we have a Cost C.   Thus for a clustering with K clusters, the Total Cost is KC.   Define the Value of a clustering to be = Why is there always a clustering of Total Benefit n? Total Benefit  ­ Total Cost.   Find the clustering of highest value, over all choices of K.   Total benefit increases with increasing K. But can stop when it doesn t increase by much . The Cost term enforces this. 5 Introduc)on to Informa)on Retrieval h. 17 C Introduc)on to Informa)on...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online