Chap4-Part5 - Bayesian Logistic Regression Sargur N....

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Bayesian Logistic Regression Sargur N. Srihari University at Buffalo, State University of New York USA Topics Recap of Logistic Regression Roadmap of Bayesian Logistic Regression Laplace Approximation Evaluation of posterior distribution Gaussian approximation Predictive Distribution Convolution of Sigmoid and Gaussian Approximate sigmoid with probit Variational Bayesian Logistic Regression Machine Learning Srihari 2 Recap of Logistic Regression Feature vector , two-classes C 1 and C 2 A posteriori probability p(C 1 | ) can be written as p(C 1 | ) =y( ) = ( w T ) where is a M-dimensional feature vector (.) is the logistic sigmoid function Goal is to determine the M parameters Known as logistic regression in statistics Although a model for classification rather than for regression Srihari Machine Learning Determining Logistic Regression parameters Maximum Likelihood Approach for Two classes Data set ( n ,t n } where t n {0,1} and n = ( x n ) , n=1,..,N Since t n is binary we can use Bernoulli Let y n be the probability that t n =1 Likelihood function associated with N observations where t =(t 1 ,..,t N ) T and y n = p ( C 1 | n ) p ( t | w ) = y n t n n = 1 N 1 y n { } 1 t n 4 Srihari Machine Learning Simple sequential solution Error function is the negative of the log-likelihood E ( w ) = ln p ( t | w ) = t n ln y n + (1 t n )ln(1 y n ) { } n = 1 N Cross-entropy error function 5 Srihari Machine Learning No closed-form maximum likelihood solution for determining w Given Gradient of error function Solve using an iterative approach where w + 1 = w E n E n = ( y n t n ) n E ( w ) = y n t n ( ) n = 1 N n Solution has severe over-fitting problems for linearly separable data So use IRLS algorithm Error x Feature Vector Posterior probability of class C 1 is p(C 1 | ) =y( ) = ( w T ) Likelihood Function for data set { n , t n }, t n {0,1}, n = ( x n ) 1. Error Function Log-likelihood yields Cross-entropy IRLS for Logistic Regression E (w) = t n ln y n + (1 t n )ln(1 y n ) { } n = 1 N 6 Srihari Machine Learning p ( t | w ) = y n t n n = 1 N 1 y n { } 1 t n 2. Gradient of Error Function: 3. Hessian: Hessian is not constant and depends on w through R Since H is positive-definite (i.e., for arbitrary u, u T Hu>0 ) error function is a concave function of w and so has a unique minimum IRLS for Logistic Regression E ( w ) = ( y n...
View Full Document

Page1 / 24

Chap4-Part5 - Bayesian Logistic Regression Sargur N....

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online