This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 Stream Clustering Extension of DGIM to More Complex Problems 2 Clustering a Stream ◆ Assume points enter in a stream. ◆ Maintain a sliding window of points. ◆ Queries ask for clusters of points within some suffix of the window. ◆ Important issue : where are the cluster centroids? 3 BDMO Approach ◆ BDMO = Babcock, Datar, Motwani, O’Callaghan. ◆ k –means based. ◆ Can use less than O( N ) space for windows of size N . ◆ Generalizes trick of DGIM: buckets of increasing “weight.” 4 Recall DGIM ◆ Maintains a sequence of buckets B 1 , B 2 , … ◆ Buckets have timestamps (most recent stream element in bucket). ◆ Sizes of buckets nondecreasing. ◗ In DGIM size = power of 2. ◆ Either 1 or 2 of each size. 5 Alternative Combining Rule ◆ Instead of “combine the 2 nd and 3 rd of any one size” we could say: ◆ “Combine B i+1 and B i if size(B i+1 ∪ B i ) < size(B i1 ∪ B i2 ∪ … ∪ B 1 ).” ◗ If B i+1 , B i , and B i1 are the same size, inequality must hold (almost). ◗ If B i1 is smaller, it cannot hold. 6 Buckets for Clustering ◆ In place of “size” (number of 1’s) we use (an approximation to) the sum of the distances from all points to the centroid of their cluster. of their cluster....
View
Full
Document
This document was uploaded on 03/04/2012.
 Fall '09

Click to edit the document details