This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **1 Clustering Algorithms Hierarchical Clustering k-Means Algorithms CURE Algorithm 2 Methods of Clustering r Hierarchical (Agglomerative) : R Initially, each point in cluster by itself. R Repeatedly combine the two “nearest” clusters into one. r Point Assignment : R Maintain a set of clusters. R Place points into their “nearest” cluster. 3 Hierarchical Clustering r Two important questions: 1. How do you determine the “nearness” of clusters? 2. How do you represent a cluster of more than one point? 4 Hierarchical Clustering --- (2) r Key problem : as you build clusters, how do you represent the location of each cluster, to tell which pair of clusters is closest? r Euclidean case : each cluster has a c e n t r o i d = average of its points. R Measure intercluster distances by distances of centroids. 5 Example (5,3) o (1,2) o o (2,1) o (4,1) o (0,0) o (5,0) x (1.5,1.5) x (4.5,0.5) x (1,1) x (4.7,1.3) 6 And in the Non-Euclidean Case? r The only “locations” we can talk about are the points themselves. R I.e., there is no “average” of two points. r Approach 1 : c l u s t r o i d = point “closest” to other points. R Treat clustroid as if it were centroid, when computing intercluster distances. 7 “Closest” Point? r Possible meanings: 1. Smallest maximum distance to the other points. 2. Smallest average distance to other points. 3. Smallest sum of squares of distances to other points. 4. Etc., etc. 8 Example 1 2 3 4 5 6 intercluster distance clustroid clustroid 9 Other Approaches to Defining “Nearness” of Clusters r Approach 2 : intercluster distance = minimum of the distances between any two points, one from each cluster. r Approach 3 : Pick a notion of “cohesion” of clusters, e.g., maximum distance from the clustroid. R Merge clusters whose u n i o n is most cohesive. 10 Return to Euclidean Case r Approaches 2 and 3 are also used sometimes in Euclidean clustering....

View
Full
Document