Cluster Analysis 2

Cluster Analyses Part 2 STA 4702/5701 Spring 2009 1 Step 1. Choose the distance or similarity measure to be used in identifying clusters. We talked about some of the options last class. Step 2. Calculate the matrix of pair-wise distances or similarities between observations. Example: Suppose we have n = 4 observations, each with three variables { measured. We require a matrix of the pairwise Euclidean distances between observations. e.g. After calculating all pairwise distances we can construct a symmetric n × n matrix of the “similarities” between each pair: Obs. # 1 2 3 4 1 0 2.236 7.874 6.083 2 2.236 0 10.344 7.483 3 7.874 10.344 0 9.950 4 6.083 7.483 9.950 0 This matrix is now used to start our construction of clusters of observations. Step 3. Choose a clustering method. Clustering Methods A) Hierarchical Methods two types: Agglomerative Methods and Divisive Methods a. Agglomerative : start with each observation in its own cluster; compare similarities; if 2 observations are similar combine them into a single cluster; continue to group similar observations and clusters; stop clustering when all observations are combined into a single cluster b. Divisive : start with all observations in a single cluster; divide the cluster into two groups such that the variability within each new cluster is lower than the variability of the original cluster; continue until all observations are in their own cluster.

Cluster Analyses Part 2 STA 4702/5701 Spring 2009 2 Note that in hierarchical methods, the set of k
