This preview shows pages 1–7. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 Stream Clustering Extension of DGIM to More Complex Problems 2 Clustering a Stream Assume points enter in a stream. Maintain a sliding window of points. Queries ask for clusters of points within some suffix of the window. Important issue : where are the cluster centroids? 3 BDMO Approach BDMO = Babcock, Datar, Motwani, OCallaghan. k means based. Can use less than O( N ) space for windows of size N . Generalizes trick of DGIM: buckets of increasing weight. 4 Recall DGIM Maintains a sequence of buckets B 1 , B 2 , Buckets have timestamps (most recent stream element in bucket). Sizes of buckets nondecreasing. In DGIM size = power of 2. Either 1 or 2 of each size. 5 Alternative Combining Rule Instead of combine the 2 nd and 3 rd of any one size we could say: Combine B i+1 and B i if size(B i+1 B i ) < size(B i1 B i2 B 1 ). If B i+1 , B i , and B i1 are the same size, inequality must hold (almost). If B i1 is smaller, it cannot hold. 6 Buckets for Clustering In place of size (number of 1s) we use (an approximation to) the sum of the distances from all points to the centroid of their cluster. of their cluster....
View Full
Document
 Fall '09

Click to edit the document details