DSC 441 Assignment 5 Problem 1 i. a. Cluster centers are found by calculating the squared Euclidean distance with the smallest square means of distance of each data point to the center(s). b. The squared Euclidean distance is used to measure similarity c. i. Final Cluster Centers K=3 Final Cluster Centers Cluster 1 2 3 V1 18.72 11.96 14.65 V2 16.30 13.27 14.46 V3 .8851 .8522 .8792 V4 6.2089 5.2293 5.5601 V5 3.723 2.873 3.261 V6 3.6036 4.7597 2.6967 V7 6.066 5.089 5.165 K=4 Final Cluster Centers Cluster 1 2 3 4 V1 11.92 14.10 19.15 16.41 V2 13.26 14.20 16.47 15.32 V3 .8512 .8782 .8871 .8783 V4 5.2256 5.4756 6.2689 5.8554 V5 2.865 3.213 3.773 3.424 V6 4.8855 2.3701 3.4604 3.9610 V7 5.087 5.066 6.127 5.627 K=5
Final Cluster Centers Cluster 1 2 3 4 5 V1 11.98 14.93 18.46 12.09 19.78 V2 13.29 14.59 16.19 13.31 16.74 V3 .8508 .8805 .8847 .8571 .8865 V4 5.2414 5.6076 6.1725 5.2174 6.3576 V5 2.880 3.293 3.692 2.901 3.847 V6 5.6733 2.7819 3.1941 3.3438 5.2758 V7 5.122 5.212 6.035 5.005 6.194 K=6 Final Cluster Centers Cluster 1 2 3 4 5 6 V1 12.07 14.56 11.98 18.95 16.46 19.58 V2 13.30 14.41 13.29 16.39 15.33 16.65 V3 .8567 .8800 .8508 .8868 .8798 .8877 V4 5.2164 5.5545 5.2414 6.2475 5.8540 6.3159 V5 2.898 3.267 2.880 3.745 3.436 3.835 V6 3.3942 2.3485 5.6733 2.7235 4.0492 5.0815 V7 5.011 5.130 5.122 6.119 5.614 6.144 ii. Number of cases in each cluster K=3 Number of Cases in each Cluster Cluster 1 61.000 2 77.000 3 72.000 Valid 210.000 Missing .000 K=4 Number of Cases in each Cluster Cluster 1 72.000 2 59.000 3 48.000
4 31.000 Valid 210.000 Missing .000 K=5 Number of Cases in each Cluster Cluster 1 42.000 2 63.000 3 49.000 4 44.000 5 12.000 Valid 210.000 Missing .000 K=6 Number of Cases in each Cluster Cluster 1 43.000 2 48.000 3 42.000 4 33.000 5 29.000 6 15.000 Valid 210.000 Missing .000 iii. The class distribution within each cluster K=3 V8 * Cluster Number of Case Crosstabulation Count