{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lecture12-clustering-handout-6-per

# Assume these two steps are each done once for i

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: n cluster k)   K ­means is a special case of a general procedure known as the Expecta)on Maximiza)on (EM) algorithm.   EM is known to converge.   Number of itera*ons could be large.   G = Σk Gk   Reassignment monotonically decreases G since each vector is assigned to the closest centroid.   But in prac*ce usually isn t Introduc)on to Informa)on Retrieval Sec. 16.4 Introduc)on to Informa)on Retrieval Sec. 16.4 Convergence of K ­Means Time Complexity   Recomputa*on monotonically decreases each Gk since (mk is number of members in cluster k):   Σ (di – a)2 reaches minimum for:   Σ –2(di – a) = 0   Σ di = Σ a   mK a = Σ di   a = (1/ mk) Σ di = ck   K ­means typically converges quickly   Compu*ng distance between two docs is O(M) where M is the dimensionality of the vectors.   Reassigning clusters: O(KN) distance computa*ons, or O(KNM).   Compu*ng centroids: Each doc gets added once to some centroid:...
View Full Document

{[ snackBarMessage ]}