# lecture10 - Data Mining CS57300 Purdue University Naive...

Data Mining CS57300 Purdue University September 28, 2010

Naive Bayes classifer (cont) P ( C | X ) = P ( X | C ) P ( C ) P ( X ) m ± i =1 P ( X i | C ) P ( C )
NBC learning • Estimate prior P(C) and conditional probability distributions P(X i | C) independently w/MLE • P(C)=9/14 P(I=high|C=yes)=2/9 P(I=med|C=yes)=4/9 P(I=low|C=yes)=3/9 etc. age income student credit_rating buys_computer <=30 high no fair <=30 excellent 31…40 fair yes >40 medium fair yes >40 low yes fair yes >40 low yes excellent 31…40 low yes excellent yes <=30 fair <=30 low yes fair yes >40 yes fair yes <=30 yes excellent yes 31…40 excellent yes 31…40 yes fair yes >40 excellent

Learning CPTs from examples 0 13 2 False 17 13 10 True High Medium Low f( x ) X 1 P[ X 1 = Low | f(x) = True] = 10 (10 + 13 + 17) P[ F(x) = False] = (2 + 13) (2 + 13 + 10 + 13 + 17)
Zero counts are a problem • If an attribute value does not occur in training example, we assign zero probability to that value • How does that affect the conditional probability P[ F(x) | x ] ? • It equals 0!!! • Why is this a problem? • Adjust for zero counts by “smoothing” probability estimates

Add uniform prior Smoothing: Laplace correction 0 13 2 False 17 13 10 True High Medium Low f( x ) X 1 P[ X 1 = High | f(x) = False] = 0 (2 + 13 + 0) 0 + 1 + 0) + 3
Is assuming independence a problem? • What is the effect on probability estimates? • Over-counting evidence, leads to overly conFdent probability estimate • What is the effect on classiFcation? • Less clear… • ±or a given input x , suppose f( x ) = True • Naïve Bayes will correctly classify if P[ F( x ) = True | x ] > 0.5

Naive Bayes classifer • Simplifying (naive) assumption: attributes are conditionally independent given the class • Strengths: • Easy to implement
