hw1_Solution_Part I

hw1_Solution_Part I - CS 6375 Machine Learning 2009 Spring...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 6375 Machine Learning 2009 Spring Homework 1 Due: 01/28/2009 (tentative), 2:30pm Part I: Written questions. 30 points. 1. [15 points]. (based on an exercise from Terran Lane) The following is the training data for a binary classification task. Attr 1 Attr 2 Attr 3 Attr 4 class a 1 c -1 1 b 0 c -1 1 a 0 c 1 1 b 1 c 1 1 b 0 c 1 2 a 0 d -1 2 a 1 d -1 2 b 1 c -1 2 Construct a complete (unpruned) decision tree for this data using information gain as your splitting criterion. Show your work for entropy calculations. 2. [5 points] Problem 3.1 (b)(c) in T. Mitchell book. Give decision trees to represent the following Boolean functions: (b) ] [ C B A (c) A XOR B 3. [10 points] Paper critique. Read the paper: A comparative analysis of methods for pruning decision trees, F. Esposito, D. Malerba, and G. Semeraro, in IEEE Transactions on Pattern Analysis and Machine Learning. 1997. A critique is not just to summarize the paper (in fact, I expect your summary of the paper should be short). Instead you should think about issues such as, what are strength and weakness of the approach in the paper? have the authors conducted appropriate experiments to evaluate it? Do you have other thoughts about the proposed methods (e.g., is the underlying assumption proper? Does the method scale? Can it be used in real applications or other research problems?). Do you have suggestions for a further study? What about the writing of the paper? Are the methods and experiments clearly described in the paper? …… You don’t need to answer all of these questions. These are only meant to help you do some thinking while reading the paper.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Part II: Programming assignment. 70 points. Implement the ID3 decision tree learning algorithm that we discussed in class. You may use C, C++, Java, C#, or other languages to implement the algorithm. To simplify the implementation, your system only needs to handle binary classification tasks (i.e., each instance will have a class value of 0 or 1). In addition, you may assume that all attributes are binary-valued (i.e., the only possible attribute values are 0 and 1) and that there are no missing values in the training or test data. A sample training file ( train.dat ) and test file ( test.dat ) can be found from the course homework page. In these files, only lines containing non-space characters are relevant. The first relevant line holds the attribute names. Each following relevant line defines a single example.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 7

hw1_Solution_Part I - CS 6375 Machine Learning 2009 Spring...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online