This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Problem 1 Answer the following questions briefly: (i) For K-means clustering, let W K be the within-cluster variation if K clusters are used. Give a formula for W K . Explain how to use a plot of W K against K to determine the optimal number of clusters to use. (ii) Why is the predictive accuracy of training data not a good estimator of performance for future data? Problem 2 Classification trees in SAS EM. This problem uses the data sets TargetKnown.xls and Unclassified.xls from HW3. Recall that TargetKnown has ten variables: a binary target and nine predictor variables. You should use a stratiied random sample containing all responders and an equal number of non-responders. Unclassified has the nine predictor variables, but the target is missing in Unclassified . (i) Using TargetKnown , find a good decision tree model for predicting the target. Write a short report describing your model....
View Full Document
This note was uploaded on 02/06/2011 for the course ORIE 474 taught by Professor Apanasovich during the Spring '07 term at Cornell University (Engineering School).
- Spring '07