DSC 441 - HW5.docx - 1 Problem 1 u2013 Part 1 a Cluster...

This preview shows page 1 - 6 out of 12 pages.

1 Problem 1 – Part 1 a) Cluster centers were created by having k points represent initial groups of centroids while the objects were clustered. The way the objects were assigned to a group, was depending on the distance to the centroid. The closer it was, then the object would be assigned to it. Once that happened, the k centroids positions were recalculated by taking its mean. This process was repeated until all cluster centers are found. b) Euclidian distance was the similarity measure used c) For each k, report the following: Final Cluster Centers K = 3 Final Cluster Centers K = 4 Final Cluster Centers K = 5
2 Final Cluster Centers K = 6 Number of Elements in Each Cluster K = 3 Number of Elements in Each Cluster K = 4 Number of Elements in Each Cluster K = 5
3 Number of Elements in Each Cluster K = 6 Class Distribution Within Each Cluster K = 3 Cluster 1 (60/61) = 98.36% purity Cluster 2 (68/77) = 88.31% purity Cluster 3 (60/72) = 83.33% purity Class Distribution Within Each Cluster K = 4
4 Cluster 1 (67/75) = 89.33% purity Cluster 2 (58/67) = 86.57% purity Cluster 3 (36/40) = 90.00% purity Cluster 4 (28/28) = 100.00% purity Class Distribution Within Each Cluster K = 5 Cluster 1 (19/25) = 76% purity Cluster 2 (48/53) = 90.57% purity Cluster 3 (48/48) = 100% purity Cluster 4 (30/44) = 68.2% purity Cluster 5 (40/42) = 95.24% purity Class Distribution Within Each Cluster K = 6 Cluster 1 (49/56) = 87.50% purity Cluster 2 (52/54) = 96.30% purity
5 Cluster 3 (22/31) = 71.00% purity Cluster 4 (33/33) = 100.00% purity Cluster 5 (19/21) = 90.48% purity Cluster 6 (15/15) = 100.00% purity iv. Which k should be selected? Explain your selection.
v. For the selected k in iv, analyze and report if the normalization of the attributes will influence the clustering results.

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture