For each cluster we have a cost c thus for a

Unformatted text preview: {C,F}   Disjoint and exhaus*ve   Doesn t have a no*on of outliers by default   But can add outlier filtering Dhillon et al. ICDM 2002 – variation to fix some issues with small document clusters Introduc)on to Informa)on Retrieval How Many Clusters? K not specified in advance   Number of clusters K is given   Say, the results of a query.   Solve an op*miza*on problem: penalize having lots of clusters   Par**on n docs into predetermined number of clusters   Finding the right number of clusters is part of the problem   Given docs, par**on into an appropriate number of subsets.   E.g., for query results  ­ ideal value of K not known up front  ­ though UI may impose limits.   Can usually take an algorithm for one flavor and convert to the other. Introduc)on to Informa)on Retrieval   applica*on dependent, e.g.,...
