lecture5-annotated

# lecture5-annotated - 1 Eric Xing @ CMU, 2006-2008 1 Machine...

This preview shows pages 1–7. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Eric Xing @ CMU, 2006-2008 1 Machine Learning Machine Learning 10 10-701/15 701/15-781, Fall 2008 781, Fall 2008 Logistic Regression Logistic Regression---- generative verses discriminative generative verses discriminative classifier classifier Eric Xing Eric Xing Lecture 5, September 22, 2008 Reading: Chap. 4.3 CB Eric Xing @ CMU, 2006-2008 2 Announcements z Hw 1 due at the end of the class today z My office hour is 11:50-12:30 today (catching a flight in the afternoon) z Steve Hanneke will give the lecture on Wed 2 Eric Xing @ CMU, 2006-2008 3 Generative vs. Discriminative Classifiers z Goal: Wish to learn f: X Y, e.g., P(Y|X) z Generative classifiers (e.g., Nave Bayes): z Assume some functional form for P(X|Y), P(Y) This is a generative model of the data! z Estimate parameters of P(X|Y), P(Y) directly from training data z Use Bayes rule to calculate P(Y|X= x) z Discriminative classifiers: z Directly assume some functional form for P(Y|X) This is a discriminative model of the data! z Estimate parameters of P(Y|X) directly from training data Y i X i Y i X i Eric Xing @ CMU, 2006-2008 4 Discussion: Generative and discriminative classifiers z Generative: z Modeling the joint distribution of all data z Discriminative: z Modeling only points at the boundary z How? Regression! 3 Eric Xing @ CMU, 2006-2008 5 Linear regression z The data: z Both nodes are observed: z X is an input vector z Y is a response vector (we first consider y as a generic continuous response vector, then we consider the special case of classification where y is a discrete indicator) z A regression scheme can be used to model p ( y | x ) directly, rather than p ( x,y ) Y i X i N { } ) , ( , ), , ( ), , ( ), , ( N N y x y x y x y x L 3 3 2 2 1 1 Eric Xing @ CMU, 2006-2008 6 Classification and logistic regression 4 Eric Xing @ CMU, 2006-2008 7 The logistic function Eric Xing @ CMU, 2006-2008 8 Logistic regression (sigmoid classifier) z The condition distribution: a Bernoulli where is a logistic function z We can used the brute-force gradient method as in LR z But we can also apply generic laws by observing the p ( y | x ) is an exponential family function , more specifically, a generalized linear model (see future lectures ) y y x x x y p = 1 1 )) ( ( ) ( ) | ( x T e x + = 1 1 ) ( 5 Eric Xing @ CMU, 2006-2008 9 Training Logistic Regression: MCLE z Estimate parameters =< , 1 , ... m > to maximize the conditional likelihood of training data z Training data z Data likelihood = z Data conditional likelihood = Eric Xing @ CMU, 2006-2008 10 Expressing Conditional Log Likelihood z Recall the logistic function: and conditional likelihood: 6 Eric Xing @ CMU, 2006-2008 11 Maximizing Conditional Log Likelihood z The objective: z Good news: l( ) is concave function of z Bad news: no closed-form solution to maximize l( ) Eric Xing @ CMU, 2006-2008 12 Gradient Ascent z Property of sigmoid function: z The gradient: The gradient ascent algorithm iterate until change < For all i , repeat 7...
View Full Document

## This note was uploaded on 01/26/2010 for the course MACHINE LE 10701 taught by Professor Ericp.xing during the Fall '08 term at Carnegie Mellon.

### Page1 / 20

lecture5-annotated - 1 Eric Xing @ CMU, 2006-2008 1 Machine...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online