lecture10

# lecture10 - Data Mining CS57300 Purdue University Naive...

This preview shows pages 1–9. Sign up to view the full content.

Data Mining CS57300 Purdue University September 28, 2010

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Naive Bayes classifer (cont) P ( C | X ) = P ( X | C ) P ( C ) P ( X ) m ± i =1 P ( X i | C ) P ( C )
NBC learning • Estimate prior P(C) and conditional probability distributions P(X i | C) independently w/MLE • P(C)=9/14 P(I=high|C=yes)=2/9 P(I=med|C=yes)=4/9 P(I=low|C=yes)=3/9 etc. age income student credit_rating buys_computer <=30 high no fair <=30 excellent 31…40 fair yes >40 medium fair yes >40 low yes fair yes >40 low yes excellent 31…40 low yes excellent yes <=30 fair <=30 low yes fair yes >40 yes fair yes <=30 yes excellent yes 31…40 excellent yes 31…40 yes fair yes >40 excellent

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Learning CPTs from examples 0 13 2 False 17 13 10 True High Medium Low f( x ) X 1 P[ X 1 = Low | f(x) = True] = 10 (10 + 13 + 17) P[ F(x) = False] = (2 + 13) (2 + 13 + 10 + 13 + 17)
Zero counts are a problem • If an attribute value does not occur in training example, we assign zero probability to that value • How does that affect the conditional probability P[ F(x) | x ] ? • It equals 0!!! • Why is this a problem? • Adjust for zero counts by “smoothing” probability estimates

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Add uniform prior Smoothing: Laplace correction 0 13 2 False 17 13 10 True High Medium Low f( x ) X 1 P[ X 1 = High | f(x) = False] = 0 (2 + 13 + 0) 0 + 1 + 0) + 3
Is assuming independence a problem? • What is the effect on probability estimates? • Over-counting evidence, leads to overly conFdent probability estimate • What is the effect on classiFcation? • Less clear… • ±or a given input x , suppose f( x ) = True • Naïve Bayes will correctly classify if P[ F( x ) = True | x ] > 0.5

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Naive Bayes classifer • Simplifying (naive) assumption: attributes are conditionally independent given the class • Strengths: • Easy to implement
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 03/13/2012 for the course CS 573 taught by Professor Staff during the Fall '08 term at Purdue.

### Page1 / 33

lecture10 - Data Mining CS57300 Purdue University Naive...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online