PARTITIONING USING K MEANS CLUSTERING

PARTITIONING USING K MEANS CLUSTERING - PARTITIONING USING...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
PARTITIONING USING K-MEANS CLUSTERING This approach is not agglomerative but instead attempts to construct k non-overlapping clusters of observations such that the variability within clusters is small and the variability between clusters is large. Hence, observations within a cluster should be as similar as possible, i.e. the X 1 , , X v values of the observations within a cluster should be as similar in value as possible given the pre-determined specification of k the number of clusters. The result is called a local minimum because the results could depend strongly on the starting values for the centroids of the k clusters. To avoid this use large datasets (the starting points are sufficiently far apart then), allow the algorithm to modify the centroid of the clusters as observations are attached to a cluster (allows the locations to move as clusters are formed), or run the algorithm many times, each time with a different set of starting points. The final set of cluster groups is that which shows up the most times. The final decision as to how many clusters there are in the data is based on the same criteria we used for agglomerative clustering.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
From the SAS Help and Documentation for PROC FASTCLUS The FASTCLUS Procedure Background The FASTCLUS procedure combines an effective method for finding initial clusters with a standard iterative algorithm for minimizing the sum of squared distances from the cluster means. The result is an efficient procedure for disjoint clustering of large data sets. PROC FASTCLUS was directly inspired by Hartigan’s (1975) leader algorithm
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/22/2011 for the course STA 4702 taught by Professor Staff during the Spring '08 term at University of Florida.

Page1 / 3

PARTITIONING USING K MEANS CLUSTERING - PARTITIONING USING...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online