163 purity example number of points same cluster in

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: lusters = average similarity of all pairs within merged cluster. sim(ci , c j ) = 1 ∑ ∑ sim( x, y) ci ∪ c j ( ci ∪ c j − 1) x∈( ci ∪c j ) y∈( ci ∪c j ): y ≠ x   Compromise between single and complete link.   Two op*ons:   Averaged across all ordered pairs in the merged cluster   Averaged over all pairs between the two original clusters   No clear difference in efficacy   Ogen O(N3) if done naively or O(N2 log N) if done more cleverly Introduc)on to Informa)on Retrieval Sec. 17.3 Compu*ng Group Average Similarity   Always maintain sum of vectors in each cluster. s (c j ) = ∑ x x∈c j   Compute similarity of clusters in constant *me: sim(ci , c j ) = ( s (ci ) + s (c j )) • ( s (ci ) + s (c j )) − (| ci | + | c j |) (| ci | + | c j |)(| ci | + | c j | −1) Introduc)on to Informa)on Retrieval Sec. 16.3 What Is A Good Clustering?   Internal criterion: A good clustering will produce high quality clusters in which:   the intra ­class (that is, intra ­cluster) similarity is high   the inter ­class similarity is low   The measured quality of a clustering depends on both the document representa*on a...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online