This preview shows page 1. Sign up to view the full content.
Unformatted text preview: es classifier We would like to model P(X  Y), where X is a feature vector,
and Y is its associated label. Naïve Bayes decision rule: yNB = arg max P (XY )P (Y ) = arg max Simplifying assumption: conditional independence: given the
class label the features are independent, i.e. Y Y d
Y i=1 P ( xi  Y ) P ( Y ) If conditional independence holds, NB is an optimal classifier! P ( X Y ) = P ( x1  Y ) P ( x2  Y ) , . . . , P ( xd  Y )
How many parameters now? dk + k  1 15 16 4 10/29/13 9. Probabilistic models Training a Naïve Bayes classifier p.276 Training data: Feature matrix X (n x d) and labels y1,…yn 9.2 Probabilistic models for categorical data Example
Example 9.4: Prediction using a naive Bayes model I Email classification
Suppose our vocabulary contains three words a , b and c , and we use a
multivariate Bernoulli model for our emails, with parameters Maximum likelihood estimates:
Class prior: {i : yi = y }
ˆ
P (y ) =
n Likelihood: ˆ
P ( xi , y )
{i : Xij = xi , yi = y }/n
ˆ
P ( xi  y ) =
=
ˆ (y )
{i : yi = y }/n
P ✓ © = (0.5, 0.67, 0.33) ✓ ™ = (0.67, 0.33, 0.33) This means, for example, that the presence of b is twice as likely in spam (+),
compared with ham.
The email to be classiﬁed contains words a and b but not c , and hence is
described by the bit vector x = (1, 1, 0). We obtain likelihoods P (x©) = 0.5·0.67·(1°0.33) = 0.222 P (x™) = 0.67·0.33·(1°0.33) = 0.148 The ML classiﬁcation of x is thus spam. Peter Flach (University of Bristol) Machine Learning: Making Sense of Data August 25, 2012 17 9. Probabilistic models 18 9.2 Probabilistic models for categorical data Example Table 9.1: Training data for naive Bayes p.280 9. Probabilis...
View
Full
Document
This note was uploaded on 02/10/2014 for the course CS 545 taught by Professor Anderson,c during the Fall '08 term at Colorado State.
 Fall '08
 Anderson,C
 Machine Learning

Click to edit the document details