This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Bayesian Logistic Regression Sargur N. Srihari University at Buffalo, State University of New York USA Topics • Recap of Logistic Regression • Roadmap of Bayesian Logistic Regression • Laplace Approximation • Evaluation of posterior distribution – Gaussian approximation • Predictive Distribution – Convolution of Sigmoid and Gaussian – Approximate sigmoid with probit • Variational Bayesian Logistic Regression Machine Learning Srihari 2 Recap of Logistic Regression • Feature vector φ , twoclasses C 1 and C 2 • A posteriori probability p(C 1  φ ) can be written as p(C 1  φ ) =y( φ ) = σ ( w T φ ) where φ is a Mdimensional feature vector σ (.) is the logistic sigmoid function • Goal is to determine the M parameters • Known as logistic regression in statistics – Although a model for classification rather than for regression Srihari Machine Learning Determining Logistic Regression parameters • Maximum Likelihood Approach for Two classes Data set ( φ n ,t n } where t n ε {0,1} and φ n = φ ( x n ) , n=1,..,N Since t n is binary we can use Bernoulli Let y n be the probability that t n =1 • Likelihood function associated with N observations where t =(t 1 ,..,t N ) T and y n = p ( C 1  φ n ) p ( t  w ) = y n t n n = 1 N ∏ 1 − y n { } 1 − t n 4 Srihari Machine Learning Simple sequential solution • Error function is the negative of the loglikelihood E ( w ) = − ln p ( t  w ) = − t n ln y n + (1 − t n )ln(1 − y n ) { } n = 1 N ∑ Crossentropy error function 5 Srihari Machine Learning • No closedform maximum likelihood solution for determining w • Given Gradient of error function • Solve using an iterative approach • where w τ + 1 = w τ −η∇ E n ∇ E n = ( y n − t n ) φ n ∇ E ( w ) = y n − t n ( ) n = 1 N ∑ φ n Solution has severe overfitting problems for linearly separable data So use IRLS algorithm Error x Feature Vector • Posterior probability of class C 1 is p(C 1  φ ) =y( φ ) = σ ( w T φ ) • Likelihood Function for data set { φ n , t n }, t n ε {0,1}, φ n = φ ( x n ) 1. Error Function Loglikelihood yields Crossentropy IRLS for Logistic Regression E (w) = − t n ln y n + (1 − t n )ln(1 − y n ) { } n = 1 N ∑ 6 Srihari Machine Learning p ( t  w ) = y n t n n = 1 N ∏ 1 − y n { } 1 − t n 2. Gradient of Error Function: 3. Hessian: Hessian is not constant and depends on w through R Since H is positivedefinite (i.e., for arbitrary u, u T Hu>0 ) error function is a concave function of w and so has a unique minimum IRLS for Logistic Regression ∇ E ( w ) = ( y n −...
View
Full Document
 Fall '09
 Conditional Probability, Machine Learning, Bayesian statistics

Click to edit the document details