ester96kdd-dbscan

ester96kdd-dbscan - Published in Proceedings of 2nd...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Abstract Clustering algorithms are attractive for the task of class iden- tification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large da- tabases. The well-known clustering algorithms offer no solu- tion to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to dis- cover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an ap- propriate value for it. We performed an experimental evalua- tion of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 bench- mark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clus- ters of arbitrary shape than the well-known algorithm CLAR- ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency. Keywords: Clustering Algorithms, Arbitrary Shape of Clus- ters, Efficiency on Large Spatial Databases, Handling Noise. 1. Introduction Numerous applications require the management of spatial data, i.e. data related to space. Spatial Database Systems (SDBS) (Gueting 1994) are database systems for the man- agement of spatial data. Increasingly large amounts of data are obtained from satellite images, X-ray crystallography or other automatic equipment. Therefore, automated know- ledge discovery becomes more and more important in spatial databases. Several tasks of knowledge discovery in databases (KDD) have been defined in the literature (Matheus, Chan & Pi- atetsky-Shapiro 1993). The task considered in this paper is class identification , i.e. the grouping of the objects of a data- base into meaningful subclasses. In an earth observation da- tabase, e.g., we might want to discover classes of houses along some river. Clustering algorithms are attractive for the task of class identification. However, the application to large spatial data- bases rises the following requirements for clustering algo- rithms: (1) Minimal requirements of domain knowledge to deter- mine the input parameters, because appropriate values are often not known in advance when dealing with large databases. (2) Discovery of clusters with arbitrary shape, because the shape of clusters in spatial databases may be spherical, drawn-out, linear, elongated etc. (3) Good efficiency on large databases, i.e. on databases of significantly more than just a few thousand objects. The well-known clustering algorithms offer no solution to
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/10/2012 for the course CSE 5800 taught by Professor Staff during the Fall '09 term at FIT.

Page1 / 6

ester96kdd-dbscan - Published in Proceedings of 2nd...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online