{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Chap4-Part3

# Chap4-Part3 - Logistic Regression Sargur N Srihari...

This preview shows pages 1–8. Sign up to view the full content.

Logistic Regression Sargur N. Srihari University at Buffalo, State University of New York USA

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Topics in Linear Classification using Probabilistic Discriminative Models 1. Generative vs Discriminative 2. Nonlinear basis funcs in linear classification 3. Logistic Regression Two-class, Multi-class Parameters using Maximum Likelihood Iterative Reweighted Least Squares 4. Probit Regression 5. Canonical Link Functions 2 Srihari Machine Learning
Generative vs Discriminative • Probabilistic generative models (linear) – 2-class: p(C 1 |x) written as s operating on linear function of x i.e., w t x + w 0 , for wide choice of forms for p( x |C k ) – Multiclass: p(C k | x ) given by softmax of linear function of x – MLE used to get parameters of p( x |C k ) and priors p(C k ) – Can generate synthetic data from marginal p( x ) • Probabilistic discriminative models – Direct approach – Maximize likelihood function of conditional distribution p(C k | x ) • Advantages – Fewer adaptive parameters – Improved performance when p( x |C k ) assumptions are poor approximations Srihari Machine Learning

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Nonlinear Basis Functions in Linear Models not linearly separable linearly separable Although we use linear classification models Linear-separability in feature space does not imply linear-separability in input space Srihari Machine Learning Nonlinear transformation of inputs using vector of basis functions φ (x) Original Input Space (x 1 ,x 2 ) Feature Space ( 1 , 2 )
Properties: A. Symmetry σ (-a)=1- (a) B. Inverse a=ln( /1- ) known as logit. Also known as log odds since it is the ratio ln[p(C 1 |x)/p(C 2 |x)] C. Derivative d T /da= (1- ) 2. Logistic Regression • Feature vector φ , two-classes C 1 and C 2 A posteriori probability p(C 1 | ) can be written as p(C 1 | ) =y( ) = ( w T ) where is a M -dimensional feature vector σ (.) is the logistic sigmoid function • Goal is to determine the M parameters • Known as logistic regression in statistics – Although a model for classification rather than for regression a (a) Logistic Sigmoid Srihari Machine Learning

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Fewer Parameters in Linear Discriminative Model • Discriminative approach (Logistic Regression) – For M -dim space: M adjustable parameters • Generative based on Gaussians (Bayes/NB) 2M parameters for mean M(M+1)/2 parameters for shared covariance matrix • Two class priors • Total of M(M+5)/2 + 1 parameters – Grows quadratically with M • If features assumed independent (naïve Bayes) still needs M+3 parameters 6 Srihari Machine Learning
Determining Logistic Regression parameters Maximum Likelihood Approach for Two classes Data set ( φ n ,t n } where t n e {0,1} and n = ( x n ) , n=1,. .,N Since t n is binary we can use Bernoulli Let y n be the probability that t n =1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 27

Chap4-Part3 - Logistic Regression Sargur N Srihari...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online