This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **Bayesian Logistic Regression Sargur N. Srihari University at Buffalo, State University of New York USA Topics • Recap of Logistic Regression • Roadmap of Bayesian Logistic Regression • Laplace Approximation • Evaluation of posterior distribution – Gaussian approximation • Predictive Distribution – Convolution of Sigmoid and Gaussian – Approximate sigmoid with probit • Variational Bayesian Logistic Regression Machine Learning Srihari 2 Recap of Logistic Regression • Feature vector φ , two-classes C 1 and C 2 • A posteriori probability p(C 1 | φ ) can be written as p(C 1 | φ ) =y( φ ) = σ ( w T φ ) where φ is a M-dimensional feature vector σ (.) is the logistic sigmoid function • Goal is to determine the M parameters • Known as logistic regression in statistics – Although a model for classification rather than for regression Srihari Machine Learning Determining Logistic Regression parameters • Maximum Likelihood Approach for Two classes Data set ( φ n ,t n } where t n ε {0,1} and φ n = φ ( x n ) , n=1,..,N Since t n is binary we can use Bernoulli Let y n be the probability that t n =1 • Likelihood function associated with N observations where t =(t 1 ,..,t N ) T and y n = p ( C 1 | φ n ) p ( t | w ) = y n t n n = 1 N ∏ 1 − y n { } 1 − t n 4 Srihari Machine Learning Simple sequential solution • Error function is the negative of the log-likelihood E ( w ) = − ln p ( t | w ) = − t n ln y n + (1 − t n )ln(1 − y n ) { } n = 1 N ∑ Cross-entropy error function 5 Srihari Machine Learning • No closed-form maximum likelihood solution for determining w • Given Gradient of error function • Solve using an iterative approach • where w τ + 1 = w τ −η∇ E n ∇ E n = ( y n − t n ) φ n ∇ E ( w ) = y n − t n ( ) n = 1 N ∑ φ n Solution has severe over-fitting problems for linearly separable data So use IRLS algorithm Error x Feature Vector • Posterior probability of class C 1 is p(C 1 | φ ) =y( φ ) = σ ( w T φ ) • Likelihood Function for data set { φ n , t n }, t n ε {0,1}, φ n = φ ( x n ) 1. Error Function Log-likelihood yields Cross-entropy IRLS for Logistic Regression E (w) = − t n ln y n + (1 − t n )ln(1 − y n ) { } n = 1 N ∑ 6 Srihari Machine Learning p ( t | w ) = y n t n n = 1 N ∏ 1 − y n { } 1 − t n 2. Gradient of Error Function: 3. Hessian: Hessian is not constant and depends on w through R Since H is positive-definite (i.e., for arbitrary u, u T Hu>0 ) error function is a concave function of w and so has a unique minimum IRLS for Logistic Regression ∇ E ( w ) = ( y n −...

View
Full
Document