{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

solutions4

# solutions4 - Assignment#4 Solutions(Chapter 5 4 Consider a...

This preview shows pages 1–2. Sign up to view the full content.

Assignment#4 Solutions (Chapter 5) 4. Consider a training set that contains 100 positive examples and 400 negative examples. For each of the following candidate rules, R 1 : A + (covers 4 positive and 1 negative examples), R 2 : B + (covers 30 positive and 10 negative examples), R 3 : C + (covers 100 positive and 90 negative examples), determine which is the best and worst candidate rule according to: a) Rule accuracy. Answer: The accuracies of the rules are 80% (for R 1 ), 75% (for R 2 ), and 52.6% (for R 3 ), respectively. Therefore R 1 is the best candidate and R 3 is the worst candidate according to rule accuracy. b) FOIL’s information gain. Answer: Assume the initial rule is +. This rule covers p 0 = 100 positive examples and n 0 = 400 negative examples. The rule R 1 covers p 1 = 4 positive examples and n 1 = 1 negative example. Therefore, the information gain for this rule is 4 [ log(4/5)-log(100/500)]=8. The rule R 2 covers p 1 = 30 positive examples and n 1 = 10 negative examples. Therefore, the information gain for this rule is 30 [ log(30/40) log(100/500)] = 57.2 The rule R 3 covers p 1 = 100 positive examples and n 1 = 90 negative examples. Therefore, the information gain for this rule is 100 [log (100/190) log (100/500) ] = 139.6 Therefore, R 3 is the best candidate and R 1 is the worst candidate ac

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}