5cluster - CSE-5120-Fall-2009 Subspace Clustering Agrawal,...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSE-5120-Fall-2009 Subspace Clustering Agrawal, Gehrke, etc al, SIGMOD 98 Data : points in a multiple dimensional space. Each dimension is partitioned into ω intervals Unit : intersection of one interval from each attribute (dimension) Age Income unit Dense Unit : a unit is dense if the fraction of total data points contained in it is greater than a threshold, τ . Income Threshold = 20% dense unit Age Cluster : a maximal set of connected dense units in k-dimensions u1 u2 u3 u4 u5 Age Income u1 and u3 are connected u4 and u5 are not connected Region : an axis-parallel rectangular k-dimensional set, can be expressed as unions of units. Income Age Maximal Region R in a cluster : no proper superset of R is a region in the clus- ter. The problem: Given a set of data points and the input parameters ω and τ , find the clusters in all subspaces of the original data space and present a minimal description of each cluster in the form of a DNF expres- sion. 27 ƒ ƒ ƒ ω : the number of intervals of equal length to partition every dimension τ : a unit is dense if the fraction of data points in it is greater than τ . subspace: given a set of dimensions D = { D 1 , ..., D k } , a subset of D forms a subpace of D , e.g. { D 1 , D 3 } forms a subspace of D . Income Age Age Projection from 2-D to 1-D. Minimal description of a cluster: a non-redundant covering of the cluster with maximal regions. ƒ ƒ ƒ A region can be expressed as a conjunction of intervals of the domains A i . e.g. (30 ≤ age < 40) ∧ (1 ≤ salary < 3) ƒ ƒ ƒ A cluster is a union of regions. The minimal description of a cluster can be expressed as a DNF (disjunctive normal form) e.g. ((30 ≤ age < 50) ∧ (1 ≤ salary < 3)) ∨ ((40 ≤ age < 60) ∧ (2 ≤ salary < 4)) 30 40 50 60 1 2 3 4 Proposed algorithm: CLIQUE 1. Identify subspaces that contain clusters (find dense units in different subspaces) 2. Identify clusters 3. Generate minimal description for the clus- ters Useful Property : Age Age Income If a collection of points S is a cluster in a k- dimensional space, then S is also part of a cluster in any ( k- 1)-dimensional projections of the space. 28 Algorithm : proceeds level by level. • First determine 1-dimensional dense units by making a pass over the data: D(1) • In the k-th pass (k > 1): 1. Generate candidates Candidate Generation Procedure Candidate k-dimensional (k-1) dimensional dense units units 2. A pass over the data is made to find those candidate units that are dense. - D(k) Procedure k-dimensional Candidate k-dimensional Counting dense units dense units • The algorithm terminates when no more candidates are generated....
View Full Document

Page1 / 9

5cluster - CSE-5120-Fall-2009 Subspace Clustering Agrawal,...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online