13_naive_bayes

Nave bayes decision rule ynb arg max p xy p y arg

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: es classifier We would like to model P(X | Y), where X is a feature vector, and Y is its associated label. Naïve Bayes decision rule: yNB = arg max P (X|Y )P (Y ) = arg max Simplifying assumption: conditional independence: given the class label the features are independent, i.e. Y Y d Y i=1 P ( xi | Y ) P ( Y ) If conditional independence holds, NB is an optimal classifier! P ( X| Y ) = P ( x1 | Y ) P ( x2 | Y ) , . . . , P ( xd | Y ) How many parameters now? dk + k - 1 15 16 4 10/29/13 9. Probabilistic models Training a Naïve Bayes classifier p.276 Training data: Feature matrix X (n x d) and labels y1,…yn 9.2 Probabilistic models for categorical data Example Example 9.4: Prediction using a naive Bayes model I Email classification Suppose our vocabulary contains three words a , b and c , and we use a multivariate Bernoulli model for our e-mails, with parameters Maximum likelihood estimates: Class prior: |{i : yi = y }| ˆ P (y ) = n Likelihood: ˆ P ( xi , y ) |{i : Xij = xi , yi = y }|/n ˆ P ( xi | y ) = = ˆ (y ) |{i : yi = y }|/n P ✓ © = (0.5, 0.67, 0.33) ✓ ™ = (0.67, 0.33, 0.33) This means, for example, that the presence of b is twice as likely in spam (+), compared with ham. The e-mail to be classified contains words a and b but not c , and hence is described by the bit vector x = (1, 1, 0). We obtain likelihoods P (x|©) = 0.5·0.67·(1°0.33) = 0.222 P (x|™) = 0.67·0.33·(1°0.33) = 0.148 The ML classification of x is thus spam. Peter Flach (University of Bristol) Machine Learning: Making Sense of Data August 25, 2012 17 9. Probabilistic models 18 9.2 Probabilistic models for categorical data Example Table 9.1: Training data for naive Bayes p.280 9. Probabilis...
View Full Document

This note was uploaded on 02/10/2014 for the course CS 545 taught by Professor Anderson,c during the Fall '08 term at Colorado State.

Ask a homework question - tutors are online