Icdm 2002 variation to fix some issues with small

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: O(NM).   Assume these two steps are each done once for I itera*ons: O(IKNM). 4 Introduc)on to Informa)on Retrieval ec. 16.4 S Seed Choice Sec. 16.4 K ­means issues, varia*ons, etc.   Results can vary based on random seed selec*on.   Some seeds can result in poor convergence rate, or convergence to sub ­op*mal clusterings.   Select good seeds using a heuris*c (e.g., doc least similar to any exis*ng mean)   Try out mul*ple star*ng points   Ini*alize with the results of another method. Introduc)on to Informa)on Retrieval Introduc)on to Informa)on Retrieval Example showing sensitivity to seeds   Recompu*ng the centroid ager every assignment (rather than ager all points are re ­assigned) can improve speed of convergence of K ­means   Assumes clusters are spherical in vector space   Sensi*ve to coordinate changes, weigh*ng etc. In the above, if you start with B and E as centroids you converge to {A,B,C} and {D,E,F} If you start with D and F you converge to {A,B,D,E}...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online