cluster_analysis

# cluster_analysis - DataMining ClusterAnalysis Lecture Notes...

This preview shows pages 1–10. Sign up to view the full content.

Data Mining Cluster Analysis Lecture Notes for Chapter 17 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 DBSCAN DBSCAN is a density-based algorithm. Density = number of points within a specified radius (Eps) A point is a core point if it has more than a specified number of points (MinPts) within Eps These are points that are at the interior of a cluster A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point A noise point is any point that is not a core point or a border point.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 DBSCAN: Core, Border, and Noise Points

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 DBSCAN Algorithm Eliminate noise points Perform clustering on the remaining points
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 DBSCAN: Core, Border and Noise Points Original Points Point types: core , border and noise Eps = 10, MinPts = 4

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 When DBSCAN Works Well Original Points Clusters Resistant to Noise • Can handle clusters of different shapes and sizes
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 When DBSCAN Does NOT Work Well Original Points (MinPts=4, Eps=9.75). (MinPts=4, Eps=9.92) Varying densities • High-dimensional data

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 DBSCAN: Determining EPS and MinPts Idea is that for points in a cluster, their k th nearest neighbors are at roughly the same distance Noise points have the k th nearest neighbor at farther distance So, plot sorted distance of every point to its k th nearest neighbor
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Cluster Validity  For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster analysis, the analogous question is how to evaluate the “goodness” of the resulting clusters? But “clusters are in the eye of the beholder”!

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 04/08/2010 for the course CS 420 taught by Professor Dawsonengler during the Spring '02 term at San Jose State.

### Page1 / 26

cluster_analysis - DataMining ClusterAnalysis Lecture Notes...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online