EvidenceApproximation

EvidenceApproximation - Machine Learning Srihari Evidence...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Machine Learning Srihari Evidence Approximation: Determining hyper-parameters Sargur Srihari srihari@cedar.buffalo.edu 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Machine Learning Srihari Topics • Linear Regression with Basis Functions • Fully Bayesian Treatment – Hyper-parameters for noise and weights – Predictive distribution • Marginalize over hyper-parameters and weights • Need for Approximation • Called evidence approximation or empirical Bayes – Evaluation of evidence function • Maximizing the evidence function • Interpretation: Effective no. of parameters 2
Background image of page 2
Machine Learning Srihari 3 Linear Regression with Basis Functions Polynomial regression extended by considering nonlinear functions of input variables – where φ j ( x ) are called Basis functions – There are M parameters instead of d parameters – Can be written as – where w=( w 0 ,w 1 ,..,w M-1 ) and =( 0 , 1 ,.., M-1 ) T y ( x,w ) = w j j ( x ) j = 0 M 1 = w T ( x )
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Machine Learning Srihari • Target variable is a scalar t given by deterministic function y( x,w ) with additive Gaussian noise t = y( x,w )+ ε is zero-mean Gaussian with precision β • Thus distribution of t is univariate normal: p(t| x,w , )=N(t|y( x,w ), -1 ) Noise Model mean variance
Background image of page 4
Machine Learning Srihari 5 Fully Bayesian treatment • Prior distribution of parameter p( w ) • Since likelihood p(t| w ) with Gaussian noise has an exponential form – Conjugate prior is chosen to be Gaussian p( w )=N( w | m 0 ,S 0 ) with mean m 0 and covariance S 0 • Posterior is a Gaussian p( w | t )=N( w | m N ,S N ) where m N =S N (S 0 -1 m 0 + βΦ T t ) and S N -1 =S 0 -1 + T Φ Design matrix
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Machine Learning Srihari Hyper-parameters • There are two hyper-parameters in this Bayesian treatment β is precision of noise α is precision of weights • Zero mean isotropic Gaussian 6 p ( w | α ) = N ( w | 0 , 1 I ) Single precision parameter
Background image of page 6
Machine Learning Srihari
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 02/25/2012.

Page1 / 20

EvidenceApproximation - Machine Learning Srihari Evidence...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online