Chap4-Part5

# Chap4-Part5 - Bayesian Logistic Regression Sargur N Srihari...

This preview shows pages 1–8. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Bayesian Logistic Regression Sargur N. Srihari University at Buffalo, State University of New York USA Topics • Recap of Logistic Regression • Roadmap of Bayesian Logistic Regression • Laplace Approximation • Evaluation of posterior distribution – Gaussian approximation • Predictive Distribution – Convolution of Sigmoid and Gaussian – Approximate sigmoid with probit • Variational Bayesian Logistic Regression Machine Learning Srihari 2 Recap of Logistic Regression • Feature vector φ , two-classes C 1 and C 2 • A posteriori probability p(C 1 | φ ) can be written as p(C 1 | φ ) =y( φ ) = σ ( w T φ ) where φ is a M-dimensional feature vector σ (.) is the logistic sigmoid function • Goal is to determine the M parameters • Known as logistic regression in statistics – Although a model for classification rather than for regression Srihari Machine Learning Determining Logistic Regression parameters • Maximum Likelihood Approach for Two classes Data set ( φ n ,t n } where t n ε {0,1} and φ n = φ ( x n ) , n=1,..,N Since t n is binary we can use Bernoulli Let y n be the probability that t n =1 • Likelihood function associated with N observations where t =(t 1 ,..,t N ) T and y n = p ( C 1 | φ n ) p ( t | w ) = y n t n n = 1 N ∏ 1 − y n { } 1 − t n 4 Srihari Machine Learning Simple sequential solution • Error function is the negative of the log-likelihood E ( w ) = − ln p ( t | w ) = − t n ln y n + (1 − t n )ln(1 − y n ) { } n = 1 N ∑ Cross-entropy error function 5 Srihari Machine Learning • No closed-form maximum likelihood solution for determining w • Given Gradient of error function • Solve using an iterative approach • where w τ + 1 = w τ −η∇ E n ∇ E n = ( y n − t n ) φ n ∇ E ( w ) = y n − t n ( ) n = 1 N ∑ φ n Solution has severe over-fitting problems for linearly separable data So use IRLS algorithm Error x Feature Vector • Posterior probability of class C 1 is p(C 1 | φ ) =y( φ ) = σ ( w T φ ) • Likelihood Function for data set { φ n , t n }, t n ε {0,1}, φ n = φ ( x n ) 1. Error Function Log-likelihood yields Cross-entropy IRLS for Logistic Regression E (w) = − t n ln y n + (1 − t n )ln(1 − y n ) { } n = 1 N ∑ 6 Srihari Machine Learning p ( t | w ) = y n t n n = 1 N ∏ 1 − y n { } 1 − t n 2. Gradient of Error Function: 3. Hessian: Hessian is not constant and depends on w through R Since H is positive-definite (i.e., for arbitrary u, u T Hu>0 ) error function is a concave function of w and so has a unique minimum IRLS for Logistic Regression ∇ E ( w ) = ( y n −...
View Full Document

## This document was uploaded on 02/25/2012.

### Page1 / 24

Chap4-Part5 - Bayesian Logistic Regression Sargur N Srihari...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online