{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

5cluster - CSE-5120-Fall-2009 Income u4 u5 Subspace...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
CSE-5120-Fall-2009 Subspace Clustering Agrawal, Gehrke, etc al, SIGMOD 98 Data : points in a multiple dimensional space. Each dimension is partitioned into ω intervals Unit : intersection of one interval from each attribute (dimension) Age Income unit Dense Unit : a unit is dense if the fraction of total data points contained in it is greater than a threshold, τ . Income Threshold = 20% dense unit Age Cluster : a maximal set of connected dense units in k -dimensions u1 u2 u3 u4 u5 Age Income u1 and u3 are connected u4 and u5 are not connected Region : an axis-parallel rectangular k -dimensional set, can be expressed as unions of units. Income Age Maximal Region R in a cluster : no proper superset of R is a region in the clus- ter. The problem: Given a set of data points and the input parameters ω and τ , find the clusters in all subspaces of the original data space and present a minimal description of each cluster in the form of a DNF expres- sion. 27
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
ƒ ƒ ƒ ω : the number of intervals of equal length to partition every dimension τ : a unit is dense if the fraction of data points in it is greater than τ . subspace: given a set of dimensions D = { D 1 , ..., D k } , a subset of D forms a subpace of D , e.g. { D 1 , D 3 } forms a subspace of D . Income Age Age Projection from 2-D to 1-D. Minimal description of a cluster: a non-redundant covering of the cluster with maximal regions. ƒ ƒ ƒ A region can be expressed as a conjunction of intervals of the domains A i . e.g. (30 age < 40) (1 salary < 3) ƒ ƒ ƒ A cluster is a union of regions. The minimal description of a cluster can be expressed as a DNF (disjunctive normal form) e.g. ((30 age < 50) (1 salary < 3)) ((40 age < 60) (2 salary < 4)) 30 40 50 60 1 2 3 4 Proposed algorithm: CLIQUE 1. Identify subspaces that contain clusters (find dense units in different subspaces) 2. Identify clusters 3. Generate minimal description for the clus- ters Useful Property : Age Age Income If a collection of points S is a cluster in a k - dimensional space, then S is also part of a cluster in any ( k - 1)-dimensional projections of the space. 28
Background image of page 2
Algorithm : proceeds level by level. First determine 1-dimensional dense units by making a pass over the data: D(1) In the k-th pass (k > 1): 1. Generate candidates Candidate Generation Procedure Candidate k-dimensional (k-1) dimensional dense units units 2. A pass over the data is made to find those candidate units that are dense. - D(k) Procedure k-dimensional Candidate k-dimensional Counting dense units dense units The algorithm terminates when no more candidates are generated.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}