cs345-cl2new

cs345-cl2new - 1 More Clustering CURE Algorithm Clustering...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 More Clustering CURE Algorithm Clustering Streams 2 The CURE Algorithm Problem with BFR/ k -means: Assumes clusters are normally distributed in each dimension. And axes are fixed --- ellipses at an angle are not OK. CURE: Assumes a Euclidean distance. Allows clusters to assume any shape. 3 Example: Stanford Faculty Salaries e e e e e e e e e e e h h h h h h h h h h h h h salary age 4 Starting CURE 1. Pick a random sample of points that fit in main memory. 2. Cluster these points hierarchically --- group nearest points/clusters. 3. For each cluster, pick a sample of points, as dispersed as possible. 4. From the sample, pick representatives by moving them (say) 20% toward the centroid of the cluster. 5 Example : Initial Clusters e e e e e e e e e e e h h h h h h h h h h h h h salary age 6 Example : Pick Dispersed Points e e e e e e e e e e e h h h h h h h h h h h h h salary age Pick (say) 4 remote points for each cluster. 7 Example : Pick Dispersed Points e e e e e e e e e e e h h h h h h h h h h h h h salary age Move points (say) 20% toward the centroid. 8 Finishing CURE Now, visit each point p in the data set. Place it in the closest cluster. Normal definition of closest: that cluster with the closest (to p ) among all the sample points of all the clusters. 9 Clustering a Stream ( New Topic ) Assume points enter in a stream. Maintain a sliding window of points....
View Full Document

This document was uploaded on 01/25/2012.

Page1 / 28

cs345-cl2new - 1 More Clustering CURE Algorithm Clustering...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online