PARTITIONING USING KMEANS CLUSTERING
This approach is not agglomerative but instead attempts to construct
k
nonoverlapping
clusters of observations such that the variability within clusters is small and the variability
between clusters is large. Hence, observations within a cluster should be as similar as possible,
i.e. the X
1
,
…
, X
v
values of the observations within a cluster should be as similar in value as
possible given the predetermined specification of
k
the number of clusters.
The result is called a local minimum because the results could depend strongly on the starting
values for the centroids of the
k
clusters. To avoid this use large datasets (the starting points
are sufficiently far apart then), allow the algorithm to modify the centroid of the clusters as
observations are attached to a cluster (allows the locations to move as clusters are formed), or
run the algorithm many times, each time with a different set of starting points. The final set of
cluster groups is that which shows up the most times.
The final decision as to how many clusters there are in the data is based on the same criteria we
used for agglomerative clustering.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
From the SAS Help and Documentation for PROC FASTCLUS
The FASTCLUS Procedure
Background
The FASTCLUS procedure combines an effective method for finding initial clusters with a
standard iterative algorithm for minimizing the sum of squared distances from the cluster
means. The result is an efficient procedure for disjoint clustering of large data sets. PROC
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Staff
 Machine Learning, Replacements, Kmeans++, PROC FASTCLUS

Click to edit the document details