{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Cluster_Analysis_Lecture

Cluster_Analysis_Lecture - Cluster Analysis Cluster...

This preview shows pages 1–7. Sign up to view the full content.

Cluster Analysis Cluster analysis attempts to address a very different problem from a different point of view when compared with discriminant analysis. Given a set of observations we want to form smaller subgroups of similar observations, so that in each subgroup, objects are similar to each other, but the subgroups themselves are very different from each other. The objective of any clustering is of crucial importance and this determines the selection of variables to be used. Partitioning observations into meaningful groups with individuals in a group being more “similar” to each other than to individuals in other groups. In discriminant analysis group membership is defined before the analysis begins. In cluster analysis group membership is unknown –we are trying to identify groups. Similarity and Dissimilarity Measures. There are many ways to quantify the dissimilarities between two observations . Any reasonable index of dissimilarity between is an index such that: s r x x , s r x x , ) , ( s r x x I a) 0 ) , ( s r x x I b) 0 ) , ( = r r x x I c) ) , ( ) , ( s r s r x x I x x I = d) increases as objects become more and more dissimilar. ) , ( s r x x I s r x x , Cluster Analysis Lecture .doc - 1 -

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
How to decide if two points s are similar? s r x x , a) Ruler Distance (Euclidean distance) () [] 2 1 ' s r s r rs x x x x d = 2 1 ' ' ' 2 s s s r r r rs x x x x x x d + = b) Standardized Ruler Distance 2 1 ' s r s r rs z z z z d = c) Mahalanobis Distance 2 1 1 ' s r s r rs x x x x d Σ = d) Minkowski Metric k p i k s r rs x x d 1 1 = = Cluster Analysis Lecture .doc - 2 -
Features cost time weight incentive Object A 0 3 4 5 Object B 7 6 3 -1 Cluster Analysis Lecture .doc - 3 -

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
options ls = 64 ps = 45 nodate nonumber; title1 'Distance Matrix'; title2 'Distance Matrix: Acidosis Data'; /* Computation of Distance Matrix */ proc iml; reset print; x = { 39.8 38.0 22.2 23.2, 53.7 37.2 18.7 18.5, 47.3 39.8 23.3 22.1, 41.7 37.6 22.8 22.3, 44.7 38.5 24.8 24.4, 47.9 39.8 22.0 23.3 }; /* Data on Acidosis patients. Source: Everitt (1989), */ nrow = nrow(x); xpx = x*t(x); vdiag = vecdiag(xpx); xi = j(1,nrow,1)@vdiag; dist =sqrt(t(xi) - 2*xpx + xi); Distance Matrix Distance Matrix: Acidosis Data X 6 rows 4 cols (numeric) 39.8 38 22.2 23.2 53.7 37.2 18.7 18.5 47.3 39.8 23.3 22.1 41.7 37.6 22.8 22.3 44.7 38.5 24.8 24.4 47.9 39.8 22 23.3 NROW 1 row 1 col (numeric) 6 Cluster Analysis Lecture .doc - 4 -
XPX 6 rows 6 cols (numeric) 4059.12 4395.2 4424.92 4111.98 4358.7 4447.78 4395.2 4959.47 4865.13 4476.92 4747.75 4895.24 4424.92 4865.13 4852.63 4492.96 4763.69 4877.24 4111.98 4476.92 4492.96 4169.78 4421.15 4515.1 4358.7 4747.75 4763.69 4421.15 4690.74 4787.55 4447.78 4895.24 4877.24 4515.1 4787.55 4905.34 VDIAG 6 rows 1 col (numeric) 4059.12 4959.47 4852.63 4169.78 4690.74 4905.34 XI 6 rows 6 cols (numeric) 4059.12 4059.12 4059.12 4059.12 4059.12 4059.12 4959.47 4959.47 4959.47 4959.47 4959.47 4959.47 4852.63 4852.63 4852.63 4852.63 4852.63 4852.63 4169.78 4169.78 4169.78 4169.78 4169.78 4169.78 4690.74 4690.74 4690.74 4690.74 4690.74 4690.74 4905.34 4905.34 4905.34 4905.34 4905.34 4905.34 DIST 6 rows 6 cols (numeric) 0 15.105959 7.8682908 2.2226111 5.6973678 8.3006024 15.105959 0 9.0465463 13.244244 12.438247 8.6214848 7.8682908 9.0465463 0 6.0406953 3.9987498 1.8681542 2.2226111 13.244244 6.0406953 0 4.2684892 6.7022384 5.6973678 12.438247 3.9987498 4.2684892 0 4.580393 8.3006024 8.6214848 1.8681542 6.7022384 4.580393 0 Cluster Analysis Lecture .doc - 5 -

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Graphical methods for Clustering a) Scatter Plots. b) Principal Components. c) Profile Plots. d) Andrews’ Plots. The p variate observation for the rth experimental unit could be represented by a function ( represent p-dimensional data by a single dimensional function) The basic idea is to assign different variables to different aspects of a curve, in this case the amplitude of different sin/cosine curves. Plot the curve for each observation and group observations that have similar curve forms.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 61

Cluster_Analysis_Lecture - Cluster Analysis Cluster...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online