! Exercise 7.3.3: Give an example of a dataset and a selection of k initial centroids such that when the
points are reassigned to their nearest centroid at the end, at least one of the initial k point
strange bends, S-shapes, or even rings. Instead of representing clusters by their centroid, it uses a
collection of representative points, as the name implies.
7.4. THE CURE ALGORITHM 263
Figure 7.12:
combined another pair of points. In general, it is possible that this rule will result in an entirely dierent
clustering from that obtained using the distance-of-centroids rule.
2. Take the distance b
An optional step at the end is to x the centroids of the clusters and to reassign each point, including the
k initial points, to the k clusters. Usually, a point p will be assigned to the same cluster
7.3. K-MEANS ALGORITHMS 257
clusters than there really are, the measure will rise precipitously. The idea is expressed by the diagram of
Fig. 7.9.
Average Diameter
Number of Clusters
Correct value of
2. The Compressed Set: These are summaries, similar to the cluster summaries, but for sets of points
that have been found close to one another, but not close to any cluster. The points represented by
clustroid. We can select the clustroid in various ways, each designed to, in some sense, minimize the
distances between the clustroid and the other points in the cluster. Common choices include select
(4,9)
(10,5)
Figure 7.4: Clustering after two additional steps
However, there is a somewhat more ecient implementation of which we should be aware.
1. We start, as we must, by computing the distances
d2(p,q) = d2(p,c) + d2(c,q)
If we sum over all q other than c, and then add d2(p,c) to ROWSUM(p) to account for the fact that the
clustroid is one of the points in the cluster, we derive ROWSUM(p) = R
Exercise 7.2.1: Perform a hierarchical clustering of the one-dimensional set of points 1, 4, 9, 16, 25, 36,
49, 64, 81, assuming clusters are represented by
254 CHAPTER 7. CLUSTERING
their centroid (a
circle will be assigned to the ring if it is outside the ring. If the outlier is between the ring and the circle,
it will be assigned to one or the other, somewhat favoring the ring because its repres
the greatest authorities, since they are linked to by the two biggest hubs, A and D. For Web-sized graphs,
the only way of computing the solution to the hubsand-authorities equations is iteratively. H
(a) The square of the radius.
(b) The diameter (not squared).
What are the densities, according to (a) and (b), of the clusters that result from the merger of any two of
these three clusters. Does the
Since the last two steps are executed at most n times, and the rst two steps are executed only once, the
overall running time of this algorithm is O(n2 logn). That is better than O(n3), but it still p
1, we get b = 0.3583 and d = 0.7165. Along with c = e = 0, these values give us the limiting value of h. The
value of a can be computed from h by multiplying by LT and scaling. 2
5.5.3 Exercises for S
cfw_ou have 60 days from your date of termination or the date of this 1
s greater, to elect coverage.
Ioverage will not be reinstated until we receive the election form
1nd premium payment.
lease allo