cluster_analysis

cluster_analysis - DataMining ClusterAnalysis Lecture Notes...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Mining Cluster Analysis Lecture Notes for Chapter 17 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 DBSCAN DBSCAN is a density-based algorithm. Density = number of points within a specified radius (Eps) A point is a core point if it has more than a specified number of points (MinPts) within Eps These are points that are at the interior of a cluster A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point A noise point is any point that is not a core point or a border point.
Background image of page 2
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 DBSCAN: Core, Border, and Noise Points
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 DBSCAN Algorithm Eliminate noise points Perform clustering on the remaining points
Background image of page 4
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 DBSCAN: Core, Border and Noise Points Original Points Point types: core , border and noise Eps = 10, MinPts = 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 When DBSCAN Works Well Original Points Clusters Resistant to Noise • Can handle clusters of different shapes and sizes
Background image of page 6
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 When DBSCAN Does NOT Work Well Original Points (MinPts=4, Eps=9.75). (MinPts=4, Eps=9.92) Varying densities • High-dimensional data
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 DBSCAN: Determining EPS and MinPts Idea is that for points in a cluster, their k th nearest neighbors are at roughly the same distance Noise points have the k th nearest neighbor at farther distance So, plot sorted distance of every point to its k th nearest neighbor
Background image of page 8
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Cluster Validity  For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster analysis, the analogous question is how to evaluate the “goodness” of the resulting clusters? But “clusters are in the eye of the beholder”!
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 26

cluster_analysis - DataMining ClusterAnalysis Lecture Notes...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online