assignment7 - (c) the resulting clusters have a reasonable...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Mining Assignment #7 CSC592 – Fall ‘05 Problem Statement You are given a data set on CPU performance (cpu.arff) with the following variables: MYCT: machine cycle time in nanoseconds (integer) MMIN: minimum main memory in kilobytes (integer) MMAX: maximum main memory in kilobytes (integer) CACH: cache memory in kilobytes (integer) CHMIN: minimum channels in units (integer) CHMAX: maximum channels in units (integer) PERF: published performance index (integer) The goal is to construct a k-means clustering model that best fits the data. In clustering, best fit is usually determined by considering that (a) the clustering separates the data in a reasonable fashion (strong predictors for cluster assignment) (b) the clustering minimizes the standard deviation and/or within cluster error
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: (c) the resulting clusters have a reasonable number of instances in them Once you have constructed you cluster model determine the following: (a) What is the optimal number of clusters for this dataset? (b) What are the mean attribute values of the k centroids? (c) Which attributes are the strongest predictors for the cluster assignment? (d) Are there strongly correlated attributes? If so, which ones? (hint: use the visualization tab to answer this) Handing in your assignment Write a description of your experiments and your findings and submit this together with your model. This assignment is due on December 5 th , 2005 in class....
View Full Document

Ask a homework question - tutors are online