# dm5part4 - University of Florida CISE department Clustering...

This preview shows pages 1–10. Sign up to view the full content.

University of Florida CISE department Gator Engineering Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 DBSCAN • DBSCAN is a density based clustering algorithm • Density = number of points within a specified radius ( Eps ) • A point is a core point if it has more than specified number of points ( MinPts ) within Eps – Core point is in the interior of a cluster • A border point has fewer than MinPts within Eps but is in neighborhood of a core point • A noise point is any point that is neither a core point nor a border point
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 DBSCAN: Core, Border and Noise points

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 When DBSCAN works well Original Dataset Clusters found by DBSCAN
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 DBSCAN: Core, Border and Noise points Original Points Eps = 10, Minpts = 4 Point types: Core Border Noise

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 DBSCAN: Determining Eps and MinPts • Idea is that for points in a cluster, there k th nearest neighbors are at roughly the same distance • Noise points have the k th nearest neighbor at at farther distance • So, plot sorted distance of every point to its k th nearest neighbor. (k=4 used for 2D points)
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 Where DBSCAN doesn’t work well Original Points MinPts = 4, Eps = 9.92 Minpts = 4, Eps = 9.75

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 DENCLUE DENsity CLUstEring is a density clustering approach that models the overall density of a set of points as the sum of influence functions associated with each point • DENCLUE is based on kernel density estimation . The goal of kernel density estimation is to describe the distribution of data by a function • For kernel density estimation, the contribution of each point to the overall density function is expressed by an influence (kernel) function . The overall density is then merely the sum of the influence functions associated with each point
University of Florida CISE department Gator Engineering Data Mining Sanjay Ranka Spring 2011 DENCLUE • The resulting overall density functions will have local peaks, i.e. local density maxima, and these local peaks can be used to define clusters – For each point, a hill climbing algorithm finds the nearest peak associated with that point, and set of all data points associated with a peak form a cluster – However, if the density at a local peak is too low, then the points in the associated cluster are labeled as noise and discarded – Similarly, if two peaks are connected by a path of data

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 11/13/2011 for the course CIS 4930 taught by Professor Staff during the Spring '08 term at University of Florida.

### Page1 / 45

dm5part4 - University of Florida CISE department Clustering...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online