Cluster_Analysis_Lecture - Cluster Analysis Cluster...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
Cluster Analysis Cluster analysis attempts to address a very different problem from a different point of view when compared with discriminant analysis. Given a set of observations we want to form smaller subgroups of similar observations, so that in each subgroup, objects are similar to each other, but the subgroups themselves are very different from each other. The objective of any clustering is of crucial importance and this determines the selection of variables to be used. Partitioning observations into meaningful groups with individuals in a group being more “similar” to each other than to individuals in other groups. In discriminant analysis group membership is defined before the analysis begins. In cluster analysis group membership is unknown –we are trying to identify groups. Similarity and Dissimilarity Measures. There are many ways to quantify the dissimilarities between two observations . Any reasonable index of dissimilarity between is an index such that: s r x x , s r x x , ) , ( s r x x I a) 0 ) , ( s r x x I b) 0 ) , ( = r r x x I c) ) , ( ) , ( s r s r x x I x x I = d) increases as objects become more and more dissimilar. ) , ( s r x x I s r x x , Cluster Analysis Lecture .doc - 1 -
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
How to decide if two points s are similar? s r x x , a) Ruler Distance (Euclidean distance) () [] 2 1 ' s r s r rs x x x x d = 2 1 ' ' ' 2 s s s r r r rs x x x x x x d + = b) Standardized Ruler Distance 2 1 ' s r s r rs z z z z d = c) Mahalanobis Distance 2 1 1 ' s r s r rs x x x x d Σ = d) Minkowski Metric k p i k s r rs x x d 1 1 = = Cluster Analysis Lecture .doc - 2 -
Background image of page 2
Features cost time weight incentive Object A 0 3 4 5 Object B 7 6 3 -1 Cluster Analysis Lecture .doc - 3 -
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
options ls = 64 ps = 45 nodate nonumber; title1 'Distance Matrix'; title2 'Distance Matrix: Acidosis Data'; /* Computation of Distance Matrix */ proc iml; reset print; x = { 39.8 38.0 22.2 23.2, 53.7 37.2 18.7 18.5, 47.3 39.8 23.3 22.1, 41.7 37.6 22.8 22.3, 44.7 38.5 24.8 24.4, 47.9 39.8 22.0 23.3 }; /* Data on Acidosis patients. Source: Everitt (1989), */ nrow = nrow(x); xpx = x*t(x); vdiag = vecdiag(xpx); xi = j(1,nrow,1)@vdiag; dist =sqrt(t(xi) - 2*xpx + xi); Distance Matrix Distance Matrix: Acidosis Data X 6 rows 4 cols (numeric) 39.8 38 22.2 23.2 53.7 37.2 18.7 18.5 47.3 39.8 23.3 22.1 41.7 37.6 22.8 22.3 44.7 38.5 24.8 24.4 47.9 39.8 22 23.3 NROW 1 row 1 col (numeric) 6 Cluster Analysis Lecture .doc - 4 -
Background image of page 4
XPX 6 rows 6 cols (numeric) 4059.12 4395.2 4424.92 4111.98 4358.7 4447.78 4395.2 4959.47 4865.13 4476.92 4747.75 4895.24 4424.92 4865.13 4852.63 4492.96 4763.69 4877.24 4111.98 4476.92 4492.96 4169.78 4421.15 4515.1 4358.7 4747.75 4763.69 4421.15 4690.74 4787.55 4447.78 4895.24 4877.24 4515.1 4787.55 4905.34 VDIAG 6 rows 1 col (numeric) 4059.12 4959.47 4852.63 4169.78 4690.74 4905.34 XI 6 rows 6 cols (numeric) 4059.12 4059.12 4059.12 4059.12 4059.12 4059.12 4959.47 4959.47 4959.47 4959.47 4959.47 4959.47 4852.63 4852.63 4852.63 4852.63 4852.63 4852.63 4169.78 4169.78 4169.78 4169.78 4169.78 4169.78 4690.74 4690.74 4690.74 4690.74 4690.74 4690.74 4905.34 4905.34 4905.34 4905.34 4905.34 4905.34 DIST 6 rows 6 cols (numeric) 0 15.105959 7.8682908 2.2226111 5.6973678 8.3006024 15.105959 0 9.0465463 13.244244 12.438247 8.6214848 7.8682908 9.0465463 0 6.0406953 3.9987498 1.8681542 2.2226111 13.244244 6.0406953 0 4.2684892 6.7022384 5.6973678 12.438247 3.9987498 4.2684892 0 4.580393 8.3006024 8.6214848 1.8681542 6.7022384 4.580393 0 Cluster Analysis Lecture .doc - 5 -
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Graphical methods for Clustering a) Scatter Plots. b) Principal Components. c) Profile Plots. d) Andrews’ Plots. The p variate observation for the rth experimental unit could be represented by a function ( represent p-dimensional data by a single dimensional function) The basic idea is to assign different variables to different aspects of a curve, in this case the amplitude of different sin/cosine curves. Plot the curve for each observation and group observations that have similar curve forms.
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 61

Cluster_Analysis_Lecture - Cluster Analysis Cluster...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online