The likelihood of the data log likelihood p x ph 1 ln

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: h resulted in heads. The likelihood of the data: Log likelihood: P (X|✓) = ph (1 ln P (X|✓) = h ln p + (n p) n P(Y, X) = P(Y | X) P(X) and: h h) ln(1 P(Y, X) = P(X | Y) P(Y) p) Therefore: Taking a derivative and setting to 0: @ ln P (X|✓) h = @p p ) p= P (Y |X ) = (n (1 h) =0 p) P (X |Y )P (Y ) P (X ) This is known as Bayes’ rule h n 9 Bayes’ rule likelihood P (Y |X ) = posterior 10 Maximum a-posteriori and maximum likelihood prior P (X |Y )P (Y ) P (X ) posterior ∝ likelihood × prior The maximum a posteriori (MAP) rule: yM AP = arg max P (Y |X ) = arg max Y Y P (X |Y )P (Y ) = arg max P (X |Y )P (Y ) P (X ) Y If we ignore the prior distribution or assume it is uniform we obtain the maximum likelihood rule: yM L = arg max P (X |Y ) Y P(X) can be computed as: P (X ) = X Y P (X |Y )P (Y ) A classifier that has access to P(Y|X) is a Bayes optimal classifier. But is not important for inferring a label 12 3 10/29/13 Naïve Bayes classifier Naïve Bayes classifier Learning&the&Op)mal&Classifier& We would like to model P(X | Y), where X is a feature vector, and Y is its associated label. We would like to model P(X | Y), where X is a feature vector, and Y is its associated label. Task:%Predict%whether%or%not%a%picnic%spot%is%enjoyable% Simplifying assumption: conditional independence: given the class label the features are independent, i.e. % Training&Data:&& X%=%(X1%%%%%%%X2%%%%%%%%X3%%%%%%%%…%%%%%%%%…%%%%%%%Xd)%%%%%%%%%%Y% P ( X| Y ) = P ( x1 | Y ) P ( x2 | Y ) , . . . , P ( xd | Y ) n&rows& How many parameters now? How many parameters? Lets&learn&P(Y|X)&–&how&many&parameters?& Prior: P(Y) Prior:%P(Y%=%y)%for%all%y % k-1 if k classes %% KR1&if&K&labels& Likelihood: P(X | Y) (2d – 1)k for binary features Likelihood:%P(X=x|Y%=%y)%for%all%x,y (2 %% d&–&1)K&if&d&binary&features% 9% 13 14 Naïve Bayes classifier Naïve Bay...
View Full Document

This note was uploaded on 02/10/2014 for the course CS 545 taught by Professor Anderson,c during the Fall '08 term at Colorado State.

Ask a homework question - tutors are online