The purpose of cluster analysis is to

# The purpose of cluster analysis is to -...

This preview shows pages 1–3. Sign up to view the full content.

Cluster Analysis Notes (1) STA 4702/5701 Spring 2009 1 The purpose of cluster analysis is to: place observations into groups, or clusters, suggested by the data, where groups are not defined a priori, and observations (objects) in a given cluster tend to be similar to each other in some sense, and objects in different clusters tend to be dissimilar. : “Any generalization about cluster analysis must be vague because a vast number of clustering methods have been developed in several different fields, with different definitions of clusters and similarity among objects. The variety of clustering techniques is reflected by the variety of terms used for cluster analysis: botryology, classification, clumping, competitive learning, morphometrics, nosography, nosology, numerical taxonomy, partitioning, Q analysis, systematics, taximetrics, taxonorics, typology, unsupervised pattern recognition, vector quantization, winner take all learning, aciniformics, and agminatics.” Questions: 1) Do clusters of observations naturally exist in the data? a. shape? overlap? number? 2) If so, how do we identify the clusters? a. what method for creating clusters should be used? b. what measure of similarity (proximity, dissimilarity) should be used to identify cluster membership? 3) If so, how do we decide how many clusters there are? 4) What are the assumptions or requirements of the clustering methods?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Cluster Analysis Notes (1) STA 4702/5701 Spring 2009 2 Some Notation: ݊ the number of observations (objects, events, experimental units) to be clustered or grouped ݒ the number of random variables measured on each experimental unit ܺ the i th variable, ݅ ൌ 1, 2, … , ݒ ሺ࢏ሻ ൌሼݔ ଵ௜ ଶ௜ ,…,ݔ ௩௜ , the vector of observed values of the ݒ variables for the ݅ ௧௛ observation ൌ ሼݔҧ ,ݔҧ ,…,ݔҧ ሽ, the vector of sample means of the ݒ variables ܩ the number of clusters at any given level of the hierarchy ݊ the number of experimental units or observations in the k th cluster ൌ ሼݔҧ ଵ௞ ,ݔҧ ଶ௞ ,…,ݔҧ ௩௞ ሽ, the vector of sample means of the ݒ variables for the ݇ ௧௛ cluster ܥ ݇ ௧௛ cluster; set of identifiers of which observations are in the cluster ݀ሺ࢞,࢟ሻ the measure of similarity or dissimilarity (aka distance) between two points
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 07/22/2011 for the course STA 4702 taught by Professor Staff during the Spring '08 term at University of Florida.

### Page1 / 7

The purpose of cluster analysis is to -...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online