# Lecture13 - Data Mining Principles and Algorithms Jianyong...

This preview shows pages 1–9. Sign up to view the full content.

December 30, 2009 Data Mining: Principles and Algorithms 1 Data Mining: Pr inciples and Algorithms Jianyong Wang Database Lab, Institute of Software Department of Computer Science and Technology Tsinghua University [email protected]

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
December 30, 2009 Data Mining: Principles and Algorithms 2 Chapter 7. Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Methods Clustering High-Dimensional Data Constraint-Based Clustering Outlier Analysis Summary
December 30, 2009 Data Mining: Principles and Algorithms 3 Hierarchical Clustering Use distance matrix as clustering criteria. This method does not require the number of clusters k as an input, but needs a termination condition Step 0 Step 1 Step 2 Step 3 Step 4 b d c e a a b d e c d e a b c d e Step 4 Step 3 Step 2 Step 1 Step 0 agglomerative (AGNES) divisive (DIANA)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
December 30, 2009 Data Mining: Principles and Algorithms 4 AGNES (Agglomerative Nesting) Introduced in Kaufmann and Rousseeuw (1990) Implemented in statistical analysis packages, e.g., Splus Use the Single-Link method and the dissimilarity matrix. Merge nodes that have the least dissimilarity Go on in a non-descending fashion Eventually all nodes belong to the same cluster 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
December 30, 2009 Data Mining: Principles and Algorithms 5 Dendrogram: Shows How the Clusters are Merged Decompose data objects into several levels of nested partitioning (tree of clusters), called a dendrogram . A clustering of the data objects is obtained by cutting the dendrogram at the desired level, then each connected component forms a cluster.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
December 30, 2009 Data Mining: Principles and Algorithms 6 DIANA (Divisive Analysis) Introduced in Kaufmann and Rousseeuw (1990) Implemented in statistical analysis packages, e.g., Splus Inverse order of AGNES Eventually each node forms a cluster on its own 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
December 30, 2009 Data Mining: Principles and Algorithms 7 Advanced Hierarchical Clustering Methods Major weakness of agglomerative clustering methods - Do not scale well: time complexity of at least O ( n 2 ), where n is the number of total objects - Can never undo what was done previously Integration of hierarchical with distance-based clustering - BIRCH (1996) : uses CF-tree and incrementally adjusts the quality of sub- clusters (SIGMOD’06 test of time award) - ROCK (1999 ): clustering categorical data by neighbor and link analysis - CHAMELEON (1999) : hierarchical clustering using dynamic modeling

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
December 30, 2009 Data Mining: Principles and Algorithms 8 BIRCH (1996) Birch: Balanced Iterative Reducing and Clustering using Hierarchies (Zhang, Ramakrishnan & Livny, SIGMOD’96)
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern