This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 6.896 Sublinear Time Algorithms April 26, 2007 Lecture 22 Lecturer: Ronitt Rubinfeld Scribe: Brendan Juba 1 Overview We will continue examining sublinear time algorithms for clustering. Last time, we considered a set of n points and gave an algorithm to decide, in a constant number of queries, are they ( k, b )-radius clus- terable? Today well give an algorithm for a completely different notion of clustering, which examines the average distance of the points from their centers rather than the maximum distance. Our algorithm will make O (log n ) queries, which is worse, but outputs an approximation of how well the data can be clustered rather than simply testing whether or not a clustering exists. In fact, well even see how to find a concise representation of an approximate clustering in sublinear time. Its worth noting that the exact version of this problem is also NP-complete. 2 Notation and preliminaries Let X be a set of n points such that for any two points x,y X , the distance between x and y , dist ( x, y ), is at most M . Given k centers , c 1 , .. . ,c k X , we define f c 1 ,...,c k ( x ) = min i dist ( x,c i ) We remark that, in clustering, one should always check whether the cluster centers are allowed to be arbitrary points, or whether they are restricted to come from the input data. In this case, notice that they must lie in X , the set we wish to cluster. We will define the cost of a clustering to be the average over all points of the distance to the closest center, i.e., the cost of the clustering f c 1 ,...,c k is E X [ f c 1 ,...,c k ( x )] = 1 | X | x X f c 1 ,...,c k ( x ) Our goal to choose centers c 1 ,. . ., c,....
View Full Document
This note was uploaded on 04/02/2010 for the course CS 6.896 taught by Professor Ronittrubinfeld during the Fall '04 term at MIT.
- Fall '04