assignment3 - Data Mining Assignment #3 CSC592 Fall 05...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Mining Assignment #3 CSC592 – Fall ‘05 Problem Statement Consider the following sets of data available from the course website: 1. The mushroom data set with the "edible/poisonous" attribute as the dependent variable. 2. The usnews data set. This dataset contains college data taken from the U.S. News & World Report's Guide to America's Best Colleges. Here the "private/public" attribute is the dependent variable. Note that even though the values of this attribute are 0s and 1s, this is a categorical (not a numeric!) attribute. You are to construct J48 (C4.5) decision tree models that (a) explain the data as best as possible, and (b) generalize as much as possible. Use both the hold-out method (70-30 or 80-20 split) and 10-fold cross-validation to demonstrate that your model parameters do indeed construct models that generalize well and to illustrate that your model explains your data well. Both data sets represent raw problem domain data which you will need to first translate into the
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.
Ask a homework question - tutors are online