07-clustering

2 after all points are assigned fix the centroids of

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ts to their closest centroid. 11/26/2010 Sometimes moves points between clusters. Jure Leskovec, Stanford C246: Mining Massive Datasets 17 Reassigned points 2 4 x 6 7 5 x 3 1 8 Clusters after first round 11/26/2010 Jure Leskovec, Stanford C246: Mining Massive Datasets 18 Try different k, looking at the change in the average distance to centroid, as k increases. Average falls rapidly until right k, then changes little. Best value of k Average distance to centroid k 11/26/2010 Jure Leskovec, Stanford C246: Mining Massive Datasets 19 Too few; many long distances to centroid. x x x x xx xx x xx x x xx xx x 11/26/2010 xx x xx x xx x xx x x xx xx x x xxx x Jure Leskovec, Stanford C246: Mining Massive Datasets 20 x Just right; distances rather short. x xx x xx x xx x xx x x x x xx xx x xx x x xx xx x 11/26/2010 xx xx x x xxx x Jure Leskovec, Stanford C246: Mining Massive Datasets 21 Too many; little improvement in average distance. x x xx x xx x xx x xx x x x x xx xx x xx x x xx xx x 11/26/2010 xx xx x x xxx x Jure Leskovec, Stanford C246: Mining Massive Datasets 22 BFR [Bradley-Fayyad-Reina] is a variant of k-means designed to handle very large (disk-resident) data sets. It assumes that clusters are normally distributed around a centroid in a Euclidean space. Standard deviations in different dimensions may vary. 11/26/2010 Jur...
View Full Document

Ask a homework question - tutors are online