Unformatted text preview: lusters = average similarity of all pairs within merged cluster. sim(ci , c j ) = 1 ∑ ∑ sim( x, y) ci ∪ c j ( ci ∪ c j − 1) x∈( ci ∪c j ) y∈( ci ∪c j ): y ≠ x   Compromise between single and complete link.   Two op*ons:   Averaged across all ordered pairs in the merged cluster   Averaged over all pairs between the two original clusters   No clear diﬀerence in eﬃcacy   Ogen O(N3) if done naively or O(N2 log N) if done more cleverly Introduc)on to Informa)on Retrieval Sec. 17.3 Compu*ng Group Average Similarity   Always maintain sum of vectors in each cluster. s (c j ) = ∑ x x∈c j   Compute similarity of clusters in constant *me: sim(ci , c j ) = ( s (ci ) + s (c j )) • ( s (ci ) + s (c j )) − (| ci | + | c j |) (| ci | + | c j |)(| ci | + | c j | −1) Introduc)on to Informa)on Retrieval Sec. 16.3 What Is A Good Clustering?   Internal criterion: A good clustering will produce high quality clusters in which:   the intra ­class (that is, intra ­cluster) similarity is high   the inter ­class similarity is low   The measured quality of a clustering depends on both the document representa*on a...
