This preview shows page 1. Sign up to view the full content.
Unformatted text preview: h resulted in heads.
The likelihood of the data:
Log likelihood: P (X✓) = ph (1 ln P (X✓) = h ln p + (n p) n P(Y, X) = P(Y  X) P(X)
and: h h) ln(1 P(Y, X) = P(X  Y) P(Y) p) Therefore: Taking a derivative and setting to 0: @ ln P (X✓)
h
=
@p
p ) p= P (Y X ) = (n
(1 h)
=0
p) P (X Y )P (Y )
P (X ) This is known as Bayes’ rule h
n
9 Bayes’ rule
likelihood P (Y X ) =
posterior 10 Maximum aposteriori and maximum likelihood
prior P (X Y )P (Y )
P (X ) posterior ∝ likelihood × prior The maximum a posteriori (MAP) rule:
yM AP = arg max P (Y X ) = arg max
Y Y P (X Y )P (Y )
= arg max P (X Y )P (Y )
P (X )
Y If we ignore the prior distribution or assume it is uniform we
obtain the maximum likelihood rule: yM L = arg max P (X Y )
Y P(X) can be computed as: P (X ) = X
Y P (X Y )P (Y ) A classifier that has access to P(YX) is a Bayes optimal
classifier. But is not important for inferring a label 12 3 10/29/13 Naïve Bayes classifier Naïve Bayes classifier Learning&the&Op)mal&Classiﬁer& We would like to model P(X  Y), where X is a feature vector,
and Y is its associated label. We would like to model P(X  Y), where X is a feature vector,
and Y is its associated label. Task:%Predict%whether%or%not%a%picnic%spot%is%enjoyable% Simplifying assumption: conditional independence: given the
class label the features are independent, i.e. % Training&Data:&& X%=%(X1%%%%%%%X2%%%%%%%%X3%%%%%%%%…%%%%%%%%…%%%%%%%Xd)%%%%%%%%%%Y% P ( X Y ) = P ( x1  Y ) P ( x2  Y ) , . . . , P ( xd  Y ) n&rows& How many parameters now? How many parameters?
Lets&learn&P(YX)&–&how&many¶meters?&
Prior: P(Y) Prior:%P(Y%=%y)%for%all%y % k1 if k classes %% KR1&if&K&labels& Likelihood: P(X  Y) (2d – 1)k for binary features Likelihood:%P(X=xY%=%y)%for%all%x,y (2
%% d&–&1)K&if&d&binary&features%
9% 13 14 Naïve Bayes classifier Naïve Bay...
View
Full
Document
This note was uploaded on 02/10/2014 for the course CS 545 taught by Professor Anderson,c during the Fall '08 term at Colorado State.
 Fall '08
 Anderson,C
 Machine Learning

Click to edit the document details