This preview shows page 1. Sign up to view the full content.
Unformatted text preview: h resulted in heads.
The likelihood of the data:
Log likelihood: P (X|✓) = ph (1 ln P (X|✓) = h ln p + (n p) n P(Y, X) = P(Y | X) P(X)
and: h h) ln(1 P(Y, X) = P(X | Y) P(Y) p) Therefore: Taking a derivative and setting to 0: @ ln P (X|✓)
p ) p= P (Y |X ) = (n
p) P (X |Y )P (Y )
P (X ) This is known as Bayes’ rule h
9 Bayes’ rule
likelihood P (Y |X ) =
posterior 10 Maximum a-posteriori and maximum likelihood
prior P (X |Y )P (Y )
P (X ) posterior ∝ likelihood × prior The maximum a posteriori (MAP) rule:
yM AP = arg max P (Y |X ) = arg max
Y Y P (X |Y )P (Y )
= arg max P (X |Y )P (Y )
P (X )
Y If we ignore the prior distribution or assume it is uniform we
obtain the maximum likelihood rule: yM L = arg max P (X |Y )
Y P(X) can be computed as: P (X ) = X
Y P (X |Y )P (Y ) A classifier that has access to P(Y|X) is a Bayes optimal
classifier. But is not important for inferring a label 12 3 10/29/13 Naïve Bayes classifier Naïve Bayes classifier Learning&the&Op)mal&Classiﬁer& We would like to model P(X | Y), where X is a feature vector,
and Y is its associated label. We would like to model P(X | Y), where X is a feature vector,
and Y is its associated label. Task:%Predict%whether%or%not%a%picnic%spot%is%enjoyable% Simplifying assumption: conditional independence: given the
class label the features are independent, i.e. % Training&Data:&& X%=%(X1%%%%%%%X2%%%%%%%%X3%%%%%%%%…%%%%%%%%…%%%%%%%Xd)%%%%%%%%%%Y% P ( X| Y ) = P ( x1 | Y ) P ( x2 | Y ) , . . . , P ( xd | Y ) n&rows& How many parameters now? How many parameters?
Prior: P(Y) Prior:%P(Y%=%y)%for%all%y % k-1 if k classes %% KR1&if&K&labels& Likelihood: P(X | Y) (2d – 1)k for binary features Likelihood:%P(X=x|Y%=%y)%for%all%x,y (2
9% 13 14 Naïve Bayes classifier Naïve Bay...
View Full Document
This note was uploaded on 02/10/2014 for the course CS 545 taught by Professor Anderson,c during the Fall '08 term at Colorado State.
- Fall '08
- Machine Learning