hmwk5 - Determine a decision tree based upon choosing the...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
ACS I Data Mining Homework # 5 Due: Friday, 2/18/11 1. Consider the set of training data shown in the table below for a binary classi±cation problem (Here “+” or “-”). Note that for each record there are three attributes, two of which are binary and one which is continuous. Record A B C Class 1 T T 1.0 + 2 T T 6.0 + 3 T F 5.0 - 4 F F 4.0 + 5 F T 7.0 - 6 F T 3.0 - 7 F F 8.0 - 8 T F 7.0 + 9 F T 5.0 - a. Determine the entropy of this collection of training examples. b. What are the information gains (based on entropy measure) for attributes A and B? Which one provides the largest information gain? To make sure you are counting correctly for each attribute make a table giving the split for the child nodes like A + - T 3 1 F 1 4 c. If we split attribute C as 3, between 3 and 6 and > 6 determine the information gain. d.
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Determine a decision tree based upon choosing the largest information gain using the entropy measure and splitting attribute C as in (c). Justify your choices. 2. Suppose we have the training set of data given in the notes to classify vertebrates. Use a general to speci±c approach to obtain a rule to classify ±sh, i.e., start with the rule ( ) ⇒ ±sh and determine a rule by considering the coverage of an attribute and its accuracy. Justify why you made your choice. You do not have to compute coverage or accuracy if no outcomes are “±sh”. Make a table at each step of the test, how many records it covers, how many are classi±ed as ±sh, the accuracy and coverage. 1...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online