*This preview shows
pages
1–8. Sign up to
view the full content.*

This ** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This ** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This ** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This ** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **Bayesian Logistic Regression Sargur N. Srihari University at Buffalo, State University of New York USA Topics Recap of Logistic Regression Roadmap of Bayesian Logistic Regression Laplace Approximation Evaluation of posterior distribution Gaussian approximation Predictive Distribution Convolution of Sigmoid and Gaussian Approximate sigmoid with probit Variational Bayesian Logistic Regression Machine Learning Srihari 2 Recap of Logistic Regression Feature vector , two-classes C 1 and C 2 A posteriori probability p(C 1 | ) can be written as p(C 1 | ) =y( ) = ( w T ) where is a M-dimensional feature vector (.) is the logistic sigmoid function Goal is to determine the M parameters Known as logistic regression in statistics Although a model for classification rather than for regression Srihari Machine Learning Determining Logistic Regression parameters Maximum Likelihood Approach for Two classes Data set ( n ,t n } where t n {0,1} and n = ( x n ) , n=1,..,N Since t n is binary we can use Bernoulli Let y n be the probability that t n =1 Likelihood function associated with N observations where t =(t 1 ,..,t N ) T and y n = p ( C 1 | n ) p ( t | w ) = y n t n n = 1 N 1 y n { } 1 t n 4 Srihari Machine Learning Simple sequential solution Error function is the negative of the log-likelihood E ( w ) = ln p ( t | w ) = t n ln y n + (1 t n )ln(1 y n ) { } n = 1 N Cross-entropy error function 5 Srihari Machine Learning No closed-form maximum likelihood solution for determining w Given Gradient of error function Solve using an iterative approach where w + 1 = w E n E n = ( y n t n ) n E ( w ) = y n t n ( ) n = 1 N n Solution has severe over-fitting problems for linearly separable data So use IRLS algorithm Error x Feature Vector Posterior probability of class C 1 is p(C 1 | ) =y( ) = ( w T ) Likelihood Function for data set { n , t n }, t n {0,1}, n = ( x n ) 1. Error Function Log-likelihood yields Cross-entropy IRLS for Logistic Regression E (w) = t n ln y n + (1 t n )ln(1 y n ) { } n = 1 N 6 Srihari Machine Learning p ( t | w ) = y n t n n = 1 N 1 y n { } 1 t n 2. Gradient of Error Function: 3. Hessian: Hessian is not constant and depends on w through R Since H is positive-definite (i.e., for arbitrary u, u T Hu>0 ) error function is a concave function of w and so has a unique minimum IRLS for Logistic Regression E ( w ) = ( y n...

View Full
Document