1
1
Kmeans Clustering Analysis
Peng Liu
4/15/2008
2
Clustering Algorithms
For a given dissimilarity measure, the
algorithms of clustering fall into 2 categories:
•
Partitioning methods that attempt to optimally
separate n objects into K clusters.
•
Hierarchical methods that produce a nested
sequence of clusters.
3
Some Partitioning Methods
1.
KMeans
2.
KMedoids
3.
SelfOrganizing Maps (SOM)
(Kohonen, 1990; Tomayo, P. et al., 1998)
4
KMeans
Let x
1
, x
2
, ..., x
n
denote the objects to be
clustered (each x
i
is an mdimensional vector).
Let C(i) denote the cluster assignment for the i
th
object.
For a given K, the KMeans algorithm attempts
to find a clustering of objects that minimizes



2
1
∑
1
)
(
)
(
2
K
k
k
i
C
k
j
C
j
i
x
x
=
=
=
∑ ∑
5
KMeans (continued)
k
k
i
C
i
k
n
x
x
/
∑
)
(
=
=
It is straightforward to show that
∑
∑
∑ ∑
=
=
=
=
=
−
=
k
i
C
k
i
K
k
k
i
C
k
j
C
j
i
x
x
x
x
)
(
2
K
1
k
k
1
)
(
)
(
2


n



2
1
∑
where
n
k
is the number of objects in the k
th
cluster, and
.
Thus the KMeans algorithm
.


n
)
(
2
K
1
k
k
∑
∑
=
=
−
k
i
C
k
i
x
x
attempts to minimize
6
KMeans Clustering Algorithm
0.
Choose K points in mdimensional space as K
cluster means.
1.
Given a current set of K means, assign each
object to the nearest mean to produce an
assignment of objects to K clusters.
2.
For a given assignment of objects to K clusters,
find the new mean of each cluster by averaging
the objects in the each cluster.
3.
Repeat steps 1 and 2 until the cluster
assignments do not change.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document