This preview shows pages 1–7. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 Eric Xing @ CMU, 20062008 1 Machine Learning Machine Learning 10 10701/15 701/15781, Fall 2008 781, Fall 2008 Logistic Regression Logistic Regression generative verses discriminative generative verses discriminative classifier classifier Eric Xing Eric Xing Lecture 5, September 22, 2008 Reading: Chap. 4.3 CB Eric Xing @ CMU, 20062008 2 Announcements z Hw 1 due at the end of the class today z My office hour is 11:5012:30 today (catching a flight in the afternoon) z Steve Hanneke will give the lecture on Wed 2 Eric Xing @ CMU, 20062008 3 Generative vs. Discriminative Classifiers z Goal: Wish to learn f: X Y, e.g., P(YX) z Generative classifiers (e.g., Nave Bayes): z Assume some functional form for P(XY), P(Y) This is a generative model of the data! z Estimate parameters of P(XY), P(Y) directly from training data z Use Bayes rule to calculate P(YX= x) z Discriminative classifiers: z Directly assume some functional form for P(YX) This is a discriminative model of the data! z Estimate parameters of P(YX) directly from training data Y i X i Y i X i Eric Xing @ CMU, 20062008 4 Discussion: Generative and discriminative classifiers z Generative: z Modeling the joint distribution of all data z Discriminative: z Modeling only points at the boundary z How? Regression! 3 Eric Xing @ CMU, 20062008 5 Linear regression z The data: z Both nodes are observed: z X is an input vector z Y is a response vector (we first consider y as a generic continuous response vector, then we consider the special case of classification where y is a discrete indicator) z A regression scheme can be used to model p ( y  x ) directly, rather than p ( x,y ) Y i X i N { } ) , ( , ), , ( ), , ( ), , ( N N y x y x y x y x L 3 3 2 2 1 1 Eric Xing @ CMU, 20062008 6 Classification and logistic regression 4 Eric Xing @ CMU, 20062008 7 The logistic function Eric Xing @ CMU, 20062008 8 Logistic regression (sigmoid classifier) z The condition distribution: a Bernoulli where is a logistic function z We can used the bruteforce gradient method as in LR z But we can also apply generic laws by observing the p ( y  x ) is an exponential family function , more specifically, a generalized linear model (see future lectures ) y y x x x y p = 1 1 )) ( ( ) ( )  ( x T e x + = 1 1 ) ( 5 Eric Xing @ CMU, 20062008 9 Training Logistic Regression: MCLE z Estimate parameters =< , 1 , ... m > to maximize the conditional likelihood of training data z Training data z Data likelihood = z Data conditional likelihood = Eric Xing @ CMU, 20062008 10 Expressing Conditional Log Likelihood z Recall the logistic function: and conditional likelihood: 6 Eric Xing @ CMU, 20062008 11 Maximizing Conditional Log Likelihood z The objective: z Good news: l( ) is concave function of z Bad news: no closedform solution to maximize l( ) Eric Xing @ CMU, 20062008 12 Gradient Ascent z Property of sigmoid function: z The gradient: The gradient ascent algorithm iterate until change < For all i , repeat 7...
View
Full
Document
This note was uploaded on 01/26/2010 for the course MACHINE LE 10701 taught by Professor Ericp.xing during the Fall '08 term at Carnegie Mellon.
 Fall '08
 EricP.Xing

Click to edit the document details