Cluster Analyses Part 2
STA 4702/5701
Spring 2009
1
Step 1. Choose the distance or similarity measure to be used in identifying clusters. We talked
about some of the options last class.
Step 2. Calculate the matrix of pairwise distances or similarities between observations.
Example:
Suppose we have n = 4 observations, each with three variables {
measured.
We require a matrix of the pairwise Euclidean distances between observations.
e.g.
After calculating all pairwise distances we can construct a symmetric
n × n
matrix of the
“similarities” between each pair:
Obs. #
1
2
3
4
1
0
2.236
7.874
6.083
2
2.236
0
10.344
7.483
3
7.874
10.344
0
9.950
4
6.083
7.483
9.950
0
This matrix is now used to start our construction of clusters of observations.
Step 3. Choose a clustering method.
Clustering Methods
A)
Hierarchical Methods
–
two types: Agglomerative Methods and Divisive Methods
a.
Agglomerative
: start with each observation in its own cluster; compare
similarities; if 2 observations are similar combine them into a single cluster;
continue to group similar observations and clusters; stop clustering when all
observations are combined into a single cluster
b.
Divisive
:
start with all observations in a single cluster; divide the cluster into two
groups such that the variability within each new cluster is lower than the
variability of the original cluster; continue until all observations are in their own
cluster.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentCluster Analyses Part 2
STA 4702/5701
Spring 2009
2
Note that in hierarchical methods, the set of
k
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Staff
 euclidean distance, Singlelinkage clustering, Iris virginica, Iris versicolor, Cluster Analyses

Click to edit the document details