07-clustering

# Euclidean case each cluster has a centroid average of

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: erage of its points Measure cluster distances by distances of centroids 11/26/2010 Jure Leskovec, Stanford C246: Mining Massive Datasets 9 (5,3) o (1,2) o x (1.5,1.5) x (1,1) o (2,1) o (4,1) x (4.7,1.3) x (4.5,0.5) o (5,0) o (0,0) Data Dendrogram 11/26/2010 Jure Leskovec, Stanford C246: Mining Massive Datasets 10 The only “locations” we can talk about are the points themselves. i.e., there is no “average” of two points. Approach 1: clustroid = point “closest” to other points. Treat clustroid as if it were centroid, when computing intercluster distances. 11/26/2010 Jure Leskovec, Stanford C246: Mining Massive Datasets 11 Possible meanings: 1. Smallest maximum distance to the other points. 2. Smallest average distance to other points. 3. Smallest sum of squares of distances to other points. For distance d centroid c of cluster C is: min ∑ d ( x, c) c 11/26/2010 2 x∈C Jure Leskovec, Stanford C246: Mining Massive Datasets 12 Approach 2: intercluster distance = minimum of the distances between any two points, one from each cluster. Approach 3: Pick a notion of “cohesion” of clusters, e.g., maximum distance from the clustroid. Merge clusters whose union is most cohesive. 11/26/2010 Jure Leskovec, Stanford C246: Mining Massive Dataset...
View Full Document

Ask a homework question - tutors are online