Data Mining Assignment #3 CSC592 – Fall ‘05 Problem Statement Consider the following sets of data available from the course website: 1. The mushroom data set with the "edible/poisonous" attribute as the dependent variable. 2. The usnews data set. This dataset contains college data taken from the U.S. News & World Report's Guide to America's Best Colleges. Here the "private/public" attribute is the dependent variable. Note that even though the values of this attribute are 0s and 1s, this is a categorical (not a numeric!) attribute. You are to construct J48 (C4.5) decision tree models that (a) explain the data as best as possible, and (b) generalize as much as possible. Use both the hold-out method (70-30 or 80-20 split) and 10-fold cross-validation to demonstrate that your model parameters do indeed construct models that generalize well and to illustrate that your model explains your data well. Both data sets represent raw problem domain data which you will need to first translate into the
