CSCE 5380 Data Mining
Assignment 2: Classification
Wasana Santiteerakul
1.
Consider the training examples shown in Table 4.8 for a binary classification problem.
a.
What is the entropy of this collection of training examples with respect to positive class?
ANS
From Table 4.8, there are 4 positive training examples and 5 negative training examples.
Therefore, the entropy of this collection with respect to positive class can be calculated by
= +
+ 

Entropy
P log2P
P log2P
= 

49log2P49 59log2P59
=
.
0 9911
b.
What are the information gains of
a1
and
a2
relative to these training examples?
a1
T
F
+
3
1

1
4
ANS
=


+


Entropy
of a1
49 34log234 14log214 59 15log215 45log245
= .
0 7616
=

(
)
Information gain of a1
EntrogyParent Entropy a1
=
.

0 9911
.
=
.
0 7616
0 2295
a2
T
F
+
2
2

3
2
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
=


+


Entropy
of a1
59 25log225 35log235 49 24log224 24log224
= .
0 9839
=

(
)
Information gain of a1
EntrogyParent Entropy a1
=
.

0 9911
.
=
.
0 9839
0 0072
d.
What is the best split (among
a1
and
a2
) according to the information gain?
ANS
According to the information gain,
the best split is
a1
.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Conditional Probability, Data Mining, According to Jim, Zagreb, Decision tree learning, Gini Gini

Click to edit the document details