Chap4-Part3 - Logistic Regression Sargur N. Srihari...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Logistic Regression Sargur N. Srihari University at Buffalo, State University of New York USA
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Topics in Linear Classification using Probabilistic Discriminative Models 1. Generative vs Discriminative 2. Nonlinear basis funcs in linear classification 3. Logistic Regression Two-class, Multi-class Parameters using Maximum Likelihood Iterative Reweighted Least Squares 4. Probit Regression 5. Canonical Link Functions 2 Srihari Machine Learning
Background image of page 2
Generative vs Discriminative • Probabilistic generative models (linear) – 2-class: p(C 1 |x) written as s operating on linear function of x i.e., w t x + w 0 , for wide choice of forms for p( x |C k ) – Multiclass: p(C k | x ) given by softmax of linear function of x – MLE used to get parameters of p( x |C k ) and priors p(C k ) – Can generate synthetic data from marginal p( x ) • Probabilistic discriminative models – Direct approach – Maximize likelihood function of conditional distribution p(C k | x ) • Advantages – Fewer adaptive parameters – Improved performance when p( x |C k ) assumptions are poor approximations Srihari Machine Learning
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Nonlinear Basis Functions in Linear Models not linearly separable linearly separable Although we use linear classification models Linear-separability in feature space does not imply linear-separability in input space Srihari Machine Learning Nonlinear transformation of inputs using vector of basis functions φ (x) Original Input Space (x 1 ,x 2 ) Feature Space ( 1 , 2 )
Background image of page 4
Properties: A. Symmetry σ (-a)=1- (a) B. Inverse a=ln( /1- ) known as logit. Also known as log odds since it is the ratio ln[p(C 1 |x)/p(C 2 |x)] C. Derivative d T /da= (1- ) 2. Logistic Regression • Feature vector φ , two-classes C 1 and C 2 A posteriori probability p(C 1 | ) can be written as p(C 1 | ) =y( ) = ( w T ) where is a M -dimensional feature vector σ (.) is the logistic sigmoid function • Goal is to determine the M parameters • Known as logistic regression in statistics – Although a model for classification rather than for regression a (a) Logistic Sigmoid Srihari Machine Learning
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fewer Parameters in Linear Discriminative Model • Discriminative approach (Logistic Regression) – For M -dim space: M adjustable parameters • Generative based on Gaussians (Bayes/NB) 2M parameters for mean M(M+1)/2 parameters for shared covariance matrix • Two class priors • Total of M(M+5)/2 + 1 parameters – Grows quadratically with M • If features assumed independent (naïve Bayes) still needs M+3 parameters 6 Srihari Machine Learning
Background image of page 6
Determining Logistic Regression parameters Maximum Likelihood Approach for Two classes Data set ( φ n ,t n } where t n e {0,1} and n = ( x n ) , n=1,. .,N Since t n is binary we can use Bernoulli Let y n be the probability that t n =1
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 27

Chap4-Part3 - Logistic Regression Sargur N. Srihari...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online