ClusterValidate-L18

# ClusterValidate-L18 - CSE572/CBS572:DataMining Lecture 18:...

This preview shows pages 1–7. Sign up to view the full content.

1 CSE 572/CBS 572: Data Mining Lecture 18: Cluster Validation

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Cluster Validity  For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster analysis, the analogous question is how to evaluate the “goodness” of the resulting clusters? But “clusters are in the eye of the beholder”! Then why do we want to evaluate them? To avoid finding patterns in noise To compare clustering algorithms To compare two sets of clusters To compare two clusters
3 Clusters found in Random Data 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x y Random Points 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x K-means 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x DBSCAN 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x Complete Link

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 Internal Index (Unsupervised): Used to measure the goodness of a clustering structure without respect to external information. Sum of Squared Error (SSE) External Index (Supervised): Used to measure the extent to which cluster labels match externally supplied class labels. Entropy Relative Index: Used to compare two different clusterings or clusters. Often an external or internal index is used for this function, e.g., SSE or entropy Measures of Cluster Validity
5 Unsupervised Cluster Validation Cluster Evaluation based on Proximity Matrix Correlation between proximity and incidence matrices Visualize proximity matrix Cluster Evaluation based on Cohesion and Separation

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Measuring Cluster Validity Via Correlation Two matrices Proximity Matrix “Incidence” Matrix One row and one column for each data point An entry is 1 if the associated pair of points belongs to the same cluster An entry is 0 if the associated pair of points belongs to different clusters Compute the correlation between proximity and incidence matrices Since the matrices are symmetric, only the correlation between n(n-1) / 2 entries needs to be calculated. High correlation indicates that points that belong to the same cluster are close to each other. Not a good measure for some density or contiguity based clusters
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 04/08/2010 for the course CS 420 taught by Professor Dawsonengler during the Spring '02 term at San Jose State.

### Page1 / 35

ClusterValidate-L18 - CSE572/CBS572:DataMining Lecture 18:...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online