dm5part3 - University of Florida CISE department Clustering...

University of Florida CISE department Gator Engineering Clustering Part 3 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville

University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Hierarchical Clustering • Two main types: – Agglomerative •Start with the points as individual clusters •Merge clusters until only one is left – Divisive •Start with all the points as one cluster •Split clusters until only singleton clusters remain – Agglomerative is more popular • Traditional hierarchical algorithms use a similarity or distance matrix. – Merge or split one cluster at a time
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree. Can be visualized as a dendrogram Tree like diagram Records the sequences of merges or splits Can ‘cut’ the dendrogram to get a partitional clustering

University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Basic Agglomerative Clustering Algorithm Algorithm is straightforward Compute the proximity matrix, if necessary Let each data point be a cluster – Repeat • Merge the two closest clusters • Update the proximity matrix Until only a single cluster remains Key operation is the computation of the proximity of two clusters. Different approaches to defining the distance between clusters distinguishes the different algorithms.
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Agglomerative Hierarchical Clustering: Starting Situation • For agglomerative hierarchical clustering we start with clusters of individual points and a proximity matrix. p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . .

