ml-lecture05

ml-lecture05 - Lecture 5: Probabilistic classifers Maximum...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture 5: Probabilistic classifers Maximum likelihood learning Logistic regression Learning probabilistic classifers with neural networks Generative models Gaussian discriminant analysis September 19, 2007 1 COMP-652 Lecture 5 Classifcation so Far Given a set oF training data x i , y i , fnd a hypothesis h such that h ( x i ) = y i For as many examples as possible. Example classifers that achieve this goal so Far: Perceptrons (trained to minimize 0-1 loss) Sigmoid neurons and neural nets trained to minimize mean-squared error Another, even better, goal is to predict the probability of an instance having one of the labels . This also gives inFormation about the uncertainty in the output oF the hypothesis September 19, 2007 2 COMP-652 Lecture 5 Probabilistic classifers We want a class of hypotheses H such that h w H output a number [0 , 1] . Suppose that Y = { , 1 } . We will interpret h w ( x ) = P ( y = 1 | x ; w ) . The probability for any example to have label 1 will be h w ( x ) The probability for any example to have label will be 1- h w ( x ) The goal in this case is to maximize the log likelihood of the data September 19, 2007 3 COMP-652 Lecture 5 Bayes theorem in learning Let h be a hypothesis and D be the set of training data. Using Bayes theorem, we have: P ( h | D ) = P ( D | h ) P ( h ) P ( D ) , where: P ( h ) = prior probability of hypothesis h P ( D ) = prior probability of training data D P ( h | D ) = probability of h given D P ( D | h ) = probability of D given h September 19, 2007 4 COMP-652 Lecture 5 Choosing hypotheses P ( h | D ) = P ( D | h ) P ( h ) P ( D ) What is the most probable hypothesis given the training data? Maximum a posteriori (MAP) hypothesis h MAP : h MAP = arg max h H P ( h | D ) = arg max h H P ( D | h ) P ( h ) P ( D ) (using Bayes theorem) = arg max h H P ( D | h ) P ( h ) September 19, 2007 5 COMP-652 Lecture 5 Maximum likelihood estimation h MAP = arg max h H P ( D | h ) P ( h ) If we assume P ( h i ) = P ( h j ) (all hypotheses are equally likely a priori) then can further simplify, and choose the maximum likelihood (ML) hypothesis : h ML = arg max h H P ( D | h ) = arg max...
View Full Document

This note was uploaded on 09/04/2008 for the course COMP 652 taught by Professor Preicup during the Fall '07 term at McGill.

Page1 / 12

ml-lecture05 - Lecture 5: Probabilistic classifers Maximum...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online