{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

solution3

# solution3 - The total entropy will be 0.2000 ×(693/3204...

This preview shows pages 1–2. Sign up to view the full content.

Q1. (4 points) (1)(1 point) First cluster is 6, 12, 18, 24, 30.  Error = 360.  Second cluster is 42, 48.  Error = 18. Total Error = 378  (2)(1 point) First cluster is 6, 12, 18, 24.  Error = 180.  Second cluster is 30, 42, 48.  Error = 168.  Total Error = 348.  (3)(1 point) The two clusters are {6, 12, 18, 24, 30} and {42, 48}. (4)(1 point) MIN produces the most natural clustering. Q2. (4 points)     The purity for cluster 1, 2, 3 respectively will be 676/693=0.9755, 827/1562=0.5294,  465/949=0.4900.     The total purity will be 0.9755 × (693/3204) + 0.5294 × (1562/3204) +  0.4900 × (949/3204) = 0.6142.     The entropy for cluster 1, 2, 3 respectively will be 0.2000, 1.8408, 1.6964.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: The total entropy will be 0.2000 × (693/3204) + 1.8408 × (1562/3204) + 1.6964 × (949/3204) = 1.4431. Q3. (2 points) When the squared error criterion is used, outlier can unduly influence the clusters that are found. In particular, when outliers are present, the resulting cluster centroids may not be as representative as they otherwise would be and thus, the SSE will be higher as well. But for density-based clustering method, such as DBSCAN, the outliers will be labeled as noise points and then eliminated, so they will not influence the clustering results....
View Full Document

{[ snackBarMessage ]}