{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

kcenter

# kcenter - K-Center and Dendrogram Clustering K-Center and...

This preview shows pages 1–10. Sign up to view the full content.

K-Center and Dendrogram Clustering K-Center and Dendrogram Clustering Jia Li Department of Statistics The Pennsylvania State University Email: [email protected] http://www.stat.psu.edu/ jiali Jia Li http://www.stat.psu.edu/ jiali

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
K-Center and Dendrogram Clustering K-center Clustering Let A be a set of n objects. Partition A into K sets C 1 , C 2 , ..., C K . Cluster size of C k : the least value D for which all points in C k are: 1. within distance D of each other, or 2. within distance D / 2 of some point called the cluster center. Let the cluster size of C k be D k . The cluster size of partition S is D = max k =1 ,..., K D k . Goal: Given K , min S D ( S ). Jia Li http://www.stat.psu.edu/ jiali
K-Center and Dendrogram Clustering Comparison with k-means Assume the distance between vectors is the squared Euclidean distance. K-means: min S K k =1 i : x i C k ( x i μ k ) T ( x i μ k ) where μ k is the centroid for cluster C k . In particular, μ k = 1 N k i : x i C k x i . K-center: min S max k =1 ,..., K max i : x i C k ( x i μ k ) T ( x i μ k ) . where μ k is called the “centroid”, but may not be the mean vector. Jia Li http://www.stat.psu.edu/ jiali

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
K-Center and Dendrogram Clustering Another formulation of k-center: min S max k =1 ,..., K max i , j : x i , x j C k L ( x i , x j ) . L ( x i , x j ) denotes any distance between a pair of objects. Jia Li http://www.stat.psu.edu/ jiali
K-Center and Dendrogram Clustering Original unclustered data. Jia Li http://www.stat.psu.edu/ jiali

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
K-Center and Dendrogram Clustering Clustering by k-means. K-means focuses on average distance. Clustering by k-center. K-center focuses on worst scenario. Jia Li http://www.stat.psu.edu/ jiali
K-Center and Dendrogram Clustering Greedy Algorithm Choose a subset H from S consisting K points that are farthest apart from each other. Each point h k H represents one cluster C k . Point x i is partitioned into cluster C k if L ( x i , h k ) = min k =1 ,..., K L ( x i , h k ) . Only need pairwise distance L ( x i , x j ) for any x i , x j S . Hence, x i can be a non-vector representation of the objects. Jia Li http://www.stat.psu.edu/ jiali

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
K-Center and Dendrogram Clustering The greedy algorithm achieves an approximation factor of 2 as long as the distance measure L satisfies the triangle inequality. That is, if D = min S max k =1 ,..., K max i , j : x i , x j C k L ( x i , x j ) then the greedy algorithm guarantees that D 2 D . The relation holds if the cluster size is defined in the sense of centralized clustering. Jia Li http://www.stat.psu.edu/ jiali
K-Center and Dendrogram Clustering Pseudo Code H denotes the set of cluster representative objects { h 1 , ..., h k } S . Let cluster ( x i ) be the identity of the cluster x i S belongs to. Let dist ( x i ) be the distance between x i and its closest cluster representative object: dist ( x i ) = min h j H L ( x i , h j ) .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 1

kcenter - K-Center and Dendrogram Clustering K-Center and...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online