Stream Clustering Extension of DGIM to More Complex Problems

1 Stream Clustering Extension of DGIM to More Complex Problems

2 Clustering a Stream rhombus6 Assume points enter in a stream. rhombus6 Maintain a sliding window of points. rhombus6 Queries ask for clusters of points within some suffix of the window. rhombus6 Important issue : where are the cluster centroids?
3 BDMO Approach rhombus6 BDMO = Babcock, Datar, Motwani, O’Callaghan. rhombus6 k –means based. rhombus6 Can use less than O( N ) space for windows of size N . rhombus6 Generalizes trick of DGIM: buckets of increasing “weight.”

4 Recall DGIM rhombus6 Maintains a sequence of buckets B 1 , B 2 , rhombus6 Buckets have timestamps (most recent stream element in bucket). rhombus6 Sizes of buckets nondecreasing. rhombus4 In DGIM size = power of 2. rhombus6 Either 1 or 2 of each size.
5 Alternative Combining Rule rhombus6 Instead of “combine the 2 nd and 3 rd of any one size” we could say: rhombus6 “Combine B i+1 and B i if size(B i+1 B i ) < size(B i-1 B i-2 B 1 ).” rhombus4 If B i+1 , B i , and B i-1 are the same size, inequality must hold (almost). rhombus4 If B i-1 is smaller, it cannot hold.

6 Buckets for Clustering rhombus6 In place of “size” (number of 1’s) we use (an approximation to) the sum of the distances from all points to the centroid of their cluster.
