Lecture13 - Data Mining Principles and Algorithms Jianyong...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
December 30, 2009 Data Mining: Principles and Algorithms 1 Data Mining: Pr inciples and Algorithms Jianyong Wang Database Lab, Institute of Software Department of Computer Science and Technology Tsinghua University [email protected]
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
December 30, 2009 Data Mining: Principles and Algorithms 2 Chapter 7. Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Methods Clustering High-Dimensional Data Constraint-Based Clustering Outlier Analysis Summary
Background image of page 2
December 30, 2009 Data Mining: Principles and Algorithms 3 Hierarchical Clustering Use distance matrix as clustering criteria. This method does not require the number of clusters k as an input, but needs a termination condition Step 0 Step 1 Step 2 Step 3 Step 4 b d c e a a b d e c d e a b c d e Step 4 Step 3 Step 2 Step 1 Step 0 agglomerative (AGNES) divisive (DIANA)
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
December 30, 2009 Data Mining: Principles and Algorithms 4 AGNES (Agglomerative Nesting) Introduced in Kaufmann and Rousseeuw (1990) Implemented in statistical analysis packages, e.g., Splus Use the Single-Link method and the dissimilarity matrix. Merge nodes that have the least dissimilarity Go on in a non-descending fashion Eventually all nodes belong to the same cluster 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Background image of page 4
December 30, 2009 Data Mining: Principles and Algorithms 5 Dendrogram: Shows How the Clusters are Merged Decompose data objects into several levels of nested partitioning (tree of clusters), called a dendrogram . A clustering of the data objects is obtained by cutting the dendrogram at the desired level, then each connected component forms a cluster.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
December 30, 2009 Data Mining: Principles and Algorithms 6 DIANA (Divisive Analysis) Introduced in Kaufmann and Rousseeuw (1990) Implemented in statistical analysis packages, e.g., Splus Inverse order of AGNES Eventually each node forms a cluster on its own 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Background image of page 6
December 30, 2009 Data Mining: Principles and Algorithms 7 Advanced Hierarchical Clustering Methods Major weakness of agglomerative clustering methods - Do not scale well: time complexity of at least O ( n 2 ), where n is the number of total objects - Can never undo what was done previously Integration of hierarchical with distance-based clustering - BIRCH (1996) : uses CF-tree and incrementally adjusts the quality of sub- clusters (SIGMOD’06 test of time award) - ROCK (1999 ): clustering categorical data by neighbor and link analysis - CHAMELEON (1999) : hierarchical clustering using dynamic modeling
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
December 30, 2009 Data Mining: Principles and Algorithms 8 BIRCH (1996)
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/02/2010 for the course COMPUTER DM2009F taught by Professor Wangwei during the Fall '09 term at Tsinghua University.

Page1 / 45

Lecture13 - Data Mining Principles and Algorithms Jianyong...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online