{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lect8

# lect8 - Clustering V Outline Validating clustering results...

This preview shows pages 1–10. Sign up to view the full content.

Clustering V

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Outline Validating clustering results Randomization tests
Cluster Validity All clustering algorithms provided with a set of points output a clustering How to evaluate the “goodness” of the resulting clusters? Tricky because “clusters are in the eye of the beholder”! Then why do we want to evaluate them? To compare clustering algorithms To compare two sets of clusters To compare two clusters To decide whether there is noise in the data

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Clusters found in Random Data 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Random Points 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y K-means 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y DBSCAN 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Complete Link
Use the objective function F Dataset X , Objective function F Algorithms: A 1 , A 2 ,…A k Question: Which algorithm is the best for this objective function? R 1 = A 1 (X), R 2 = A 2 (X),…,R k =A k (X) Compare F(R 1 ), F(R 2 ),…,F(R k )

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Evaluating clusters Function H computes the cohesiveness of a cluster (e.g., smaller values larger cohesiveness) Examples of cohesiveness? Goodness of a cluster c is H(c) c is better than c’ if H(c) < H(c’)
Evaluating clusterings using cluster cohesiveness? For a clustering C consisting of k clusters c 1 ,…,c k H(C) = Φ i H(c i ) What is Φ ?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Cluster separation? Function S that measures the separation between two clusters c i , c j Ideas for S(c i ,c j )? How can we measure the goodness of a clustering C = {c 1 ,…,c k } using the separation function S ?
Silhouette Coefficient combines ideas of both cohesion and separation, but for individual points, as well as clusters and clusterings For an individual point, I a = average distance of i to the points in the same cluster b = min (average distance of i

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 32

lect8 - Clustering V Outline Validating clustering results...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online