solutions4 - Assignment#4 Solutions (Chapter 5) 4. Consider...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Assignment#4 Solutions (Chapter 5) 4. Consider a training set that contains 100 positive examples and 400 negative examples. For each of the following candidate rules, R 1 : A + (covers 4 positive and 1 negative examples), R 2 : B + (covers 30 positive and 10 negative examples), R 3 : C + (covers 100 positive and 90 negative examples), determine which is the best and worst candidate rule according to: a) Rule accuracy. Answer: The accuracies of the rules are 80% (for R 1 ), 75% (for R 2 ), and 52.6% (for R 3 ), respectively. Therefore R 1 is the best candidate and R 3 is the worst candidate according to rule accuracy. b) FOIL’s information gain. Answer: Assume the initial rule is +. This rule covers p 0 = 100 positive examples and n 0 = 400 negative examples. The rule R 1 covers p 1 = 4 positive examples and n 1 = 1 negative example. Therefore, the information gain for this rule is 4 [ log(4/5)-log(100/500)]=8. The rule R 2 covers p 1 = 30 positive examples and n 1 = 10 negative examples. Therefore, the information gain for this rule is 30 [ log(30/40) –log(100/500)] = 57.2 The rule R 3 covers p 1 = 100 positive examples and n 1 = 90 negative examples. Therefore, the information gain for this rule is 100 [log (100/190) – log (100/500) ] = 139.6 Therefore, R 3 is the best candidate and R 1 is the worst candidate according to FOIL’s
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 3

solutions4 - Assignment#4 Solutions (Chapter 5) 4. Consider...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online