{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Data Mining Assignment2

# Data Mining Assignment2 - CSCE 5380 Data Mining Assignment...

This preview shows pages 1–3. Sign up to view the full content.

CSCE 5380 Data Mining Assignment 2: Classification Wasana Santiteerakul 1. Consider the training examples shown in Table 4.8 for a binary classification problem. a. What is the entropy of this collection of training examples with respect to positive class? ANS From Table 4.8, there are 4 positive training examples and 5 negative training examples. Therefore, the entropy of this collection with respect to positive class can be calculated by =- + +- - - Entropy P log2P P log2P = - - 49log2P49 59log2P59 = . 0 9911 b. What are the information gains of a1 and a2 relative to these training examples? a1 T F + 3 1 - 1 4 ANS = - - + - - Entropy of a1 49 34log234 14log214 59 15log215 45log245 = . 0 7616 = - ( ) Information gain of a1 EntrogyParent Entropy a1 = . - 0 9911 . = . 0 7616 0 2295 a2 T F + 2 2 - 3 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
= - - + - - Entropy of a1 59 25log225 35log235 49 24log224 24log224 = . 0 9839 = - ( ) Information gain of a1 EntrogyParent Entropy a1 = . - 0 9911 . = . 0 9839 0 0072 d. What is the best split (among a1 and a2 ) according to the information gain? ANS According to the information gain, the best split is a1 .
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}