hw4_soln

# hw4_soln - CS 478 Machine Learning: Homework 4 Suggested...

This preview shows pages 1–3. Sign up to view the full content.

CS 478 Machine Learning: Homework 4 Suggested Solutions 1 The Na¨ ıve Bayes Independence Assumption (a) For case (i) P ( X = (0 , 1 , 1 , 1 , 1) | Y = +1) = P ( X 1 = 0 ,X 2 = 1 ,X 3 = 1 ,X 4 = 1 ,X 5 = 1 | Y = +1) = P ( X 1 = 0 ,X 2 = 1 | Y = +1) = P ( X 1 = 0 | Y = +1) P ( X 2 = 1 | Y = +1) = 0 . 2 × 0 . 6 = 0 . 12 (1) For case (ii) P ( X = (0 , 1 , 1 , 1 , 1) | Y = +1) = P ( X 1 = 0 ,X 2 = 1 ,X 3 = 1 ,X 4 = 1 ,X 5 = 1 | Y = +1) = P ( X 1 = 0 | Y = +1) P ( X 2 = 1 | Y = +1) P ( X 3 = 1 | Y = +1) P ( X 4 = 1 | Y = +1) P ( X 5 = 1 | Y = +1) = 0 . 2 × 0 . 6 4 = 0 . 02592 (2) (b) Due to the Naive Bayes assumption, the multivaraite Naive Bayes estimate of the probability is the same as the computation of case (ii) in part (a) above, and is 0.02592. So the two cases cannot be distinguished. (c) For case (i), using the true probability, we have P ( X = (0 , 1 , 1 , 1 , 1) ,Y = +1) = P ( X = (0 , 1 , 1 , 1 , 1) | Y = +1) P ( Y = +1) = 0 . 12 × 0 . 5 = 0 . 06 (3) And P ( X = (0 , 1 , 1 , 1 , 1) ,Y = - 1) = P ( X = (0 , 1 , 1 , 1 , 1) | Y = - 1) P ( Y = - 1) = 0 . 8 × 0 . 4 × 0 . 5 = 0 . 16 (4) Therefore by Bayes’ rule the example X = (0 , 1 , 1 , 1 , 1) will be classiﬁed as - 1. 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
For case (ii) using the true probability, we have P ( X = (0 , 1 , 1 , 1 , 1) ,Y = +1) = P ( X = (0 , 1 , 1 , 1 , 1) | Y = +1) P ( Y = +1) = 0 . 02592 × 0 . 5 = 0 . 01296 (5) And P ( X = (0 , 1 , 1 , 1 , 1) ,Y = - 1) = P ( X = (0 , 1 , 1 , 1 , 1) | Y = - 1) P ( Y = - 1) = 0 . 8 × 0 . 4 4 × 0 . 5 = 0 . 01024 (6) By Bayes’ rule the example X = (0 , 1 , 1 , 1 , 1) will be classiﬁed as +1. Now since the Naive Bayes assumption is true in case (ii), the multivariate Naive Bayes algorithm will classify X as +1 for both case (ii) and also case (i) (in which it makes a wrong independence assumption). So the Naive Bayes classiﬁer gives a classiﬁcation that is diﬀerent from that arise from the true probabilities in case (i). (d) For case (i) where X 2 ,X 3 ,X 4 ,X 5 are completely dependent, after splitting using of these attributes it becomes pointless to split using any of the remaining three since they give zero information gain (since the attributes always have same values). So the decision tree would split using X 1 and only one of X 2 ,X 3 ,X 4 ,X 5 and then stop. For case (ii) where X 2 ,X 3 ,X 4 ,X 5 are independent, we obtain some positive infor- mation gain by splitting on each of these attributes due to independence. Since we assume that the decision tree algorithm does not do early stopping, a full tree will be grown using all 5 attributes. Thus we can see that the decision tree algorithm can distinguish between case (i) and case (ii) by considering the information gain of diﬀerent attributes. Indeed it is not hard to see that the decision trees produced in each of the two cases gives the best decision rule possible, i.e., acheiving the Bayes error.
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 10/02/2008 for the course CS 478 taught by Professor Joachims during the Spring '08 term at Cornell.

### Page1 / 11

hw4_soln - CS 478 Machine Learning: Homework 4 Suggested...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online