CSCE 5380 Data Mining
Assignment 2: Classification
Wasana Santiteerakul
1.
Consider the training examples shown in Table 4.8 for a binary classification problem.
a.
What is the entropy of this collection of training examples with respect to positive class?
ANS
From Table 4.8, there are 4 positive training examples and 5 negative training examples.
Therefore, the entropy of this collection with respect to positive class can be calculated by
= +
+ 

Entropy
P log2P
P log2P
= 

49log2P49 59log2P59
=
.
0 9911
b.
What are the information gains of
a1
and
a2
relative to these training examples?
a1
T
F
+
3
1

1
4
ANS
=


+


Entropy
of a1
49 34log234 14log214 59 15log215 45log245
= .
0 7616
=

(
)
Information gain of a1
EntrogyParent Entropy a1
=
.

0 9911
.
=
.
0 7616
0 2295
a2
T
F
+
2
2

3
2
=


+


Entropy
of a1
59 25log225 35log235 49 24log224 24log224
= .
0 9839
=

(
)
Information gain of a1
EntrogyParent Entropy a1
=
.

0 9911
.
=
.
0 9839
0 0072
d.
What is the best split (among
a1
and
a2
) according to the information gain?
ANS
According to the information gain,
the best split is
a1
.
