This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Lecture 5: Probabilistic classifers Maximum likelihood learning Logistic regression Learning probabilistic classifers with neural networks Generative models Gaussian discriminant analysis September 19, 2007 1 COMP652 Lecture 5 Classifcation so Far Given a set oF training data x i , y i , fnd a hypothesis h such that h ( x i ) = y i For as many examples as possible. Example classifers that achieve this goal so Far: Perceptrons (trained to minimize 01 loss) Sigmoid neurons and neural nets trained to minimize meansquared error Another, even better, goal is to predict the probability of an instance having one of the labels . This also gives inFormation about the uncertainty in the output oF the hypothesis September 19, 2007 2 COMP652 Lecture 5 Probabilistic classifers We want a class of hypotheses H such that h w H output a number [0 , 1] . Suppose that Y = { , 1 } . We will interpret h w ( x ) = P ( y = 1  x ; w ) . The probability for any example to have label 1 will be h w ( x ) The probability for any example to have label will be 1 h w ( x ) The goal in this case is to maximize the log likelihood of the data September 19, 2007 3 COMP652 Lecture 5 Bayes theorem in learning Let h be a hypothesis and D be the set of training data. Using Bayes theorem, we have: P ( h  D ) = P ( D  h ) P ( h ) P ( D ) , where: P ( h ) = prior probability of hypothesis h P ( D ) = prior probability of training data D P ( h  D ) = probability of h given D P ( D  h ) = probability of D given h September 19, 2007 4 COMP652 Lecture 5 Choosing hypotheses P ( h  D ) = P ( D  h ) P ( h ) P ( D ) What is the most probable hypothesis given the training data? Maximum a posteriori (MAP) hypothesis h MAP : h MAP = arg max h H P ( h  D ) = arg max h H P ( D  h ) P ( h ) P ( D ) (using Bayes theorem) = arg max h H P ( D  h ) P ( h ) September 19, 2007 5 COMP652 Lecture 5 Maximum likelihood estimation h MAP = arg max h H P ( D  h ) P ( h ) If we assume P ( h i ) = P ( h j ) (all hypotheses are equally likely a priori) then can further simplify, and choose the maximum likelihood (ML) hypothesis : h ML = arg max h H P ( D  h ) = arg max...
View
Full
Document
This note was uploaded on 09/04/2008 for the course COMP 652 taught by Professor Preicup during the Fall '07 term at McGill.
 Fall '07
 PREICUP

Click to edit the document details