lect5

lect5 - Clustering II Hierarchical Clustering Produces a...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Clustering II
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges or splits 1 3 2 5 4 6 0 0.05 0.1 0.15 0.2 1 2 3 4 5 6 1 2 3 4 5
Background image of page 2
Strengths of Hierarchical Clustering No assumptions on the number of clusters Any desired number of clusters can be obtained by ‘cutting’ the dendogram at the proper level Hierarchical clusterings may correspond to meaningful taxonomies Example in biological sciences (e.g., phylogeny reconstruction, etc), web (e.g., product catalogs) etc
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Hierarchical Clustering Two main types of hierarchical clustering Agglomerative: Start with the points as individual clusters At each step, merge the closest pair of clusters until only one cluster (or k clusters) left Divisive: Start with one, all-inclusive cluster At each step, split a cluster until each cluster contains a point (or there are k clusters) Traditional hierarchical algorithms use a similarity or distance matrix Merge or split one cluster at a time
Background image of page 4
Complexity of hierarchical clustering Distance matrix is used for deciding which clusters to merge/split At least quadratic in the number of data points Not usable for large datasets
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Agglomerative clustering algorithm Most popular hierarchical clustering technique Basic algorithm 1. Compute the distance matrix between the input data points 2. Let each data point be a cluster 3. Repeat 1. Merge the two closest clusters 2. Update the distance matrix 1. Until only a single cluster remains Key operation is the computation of the distance between two clusters Different definitions of the distance between clusters lead to different algorithms
Background image of page 6
Start with clusters of individual points and a distance/proximity matrix p1 p3 p5 p4 p2 p1 p2 p3 p4 p5 . . . . . . Distance/Proximity Matrix
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 10/05/2010.

Page1 / 35

lect5 - Clustering II Hierarchical Clustering Produces a...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online