{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

EvidenceApproximation

EvidenceApproximation - Machine Learning Srihari Evidence...

This preview shows pages 1–8. Sign up to view the full content.

Machine Learning Srihari Evidence Approximation: Determining hyper-parameters Sargur Srihari 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Machine Learning Srihari Topics • Linear Regression with Basis Functions • Fully Bayesian Treatment – Hyper-parameters for noise and weights – Predictive distribution • Marginalize over hyper-parameters and weights • Need for Approximation • Called evidence approximation or empirical Bayes – Evaluation of evidence function • Maximizing the evidence function • Interpretation: Effective no. of parameters 2
Machine Learning Srihari 3 Linear Regression with Basis Functions Polynomial regression extended by considering nonlinear functions of input variables – where φ j ( x ) are called Basis functions – There are M parameters instead of d parameters – Can be written as – where w=( w 0 ,w 1 ,..,w M-1 ) and =( 0 , 1 ,.., M-1 ) T y ( x,w ) = w j j ( x ) j = 0 M 1 = w T ( x )

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Machine Learning Srihari • Target variable is a scalar t given by deterministic function y( x,w ) with additive Gaussian noise t = y( x,w )+ ε is zero-mean Gaussian with precision β • Thus distribution of t is univariate normal: p(t| x,w , )=N(t|y( x,w ), -1 ) Noise Model mean variance
Machine Learning Srihari 5 Fully Bayesian treatment • Prior distribution of parameter p( w ) • Since likelihood p(t| w ) with Gaussian noise has an exponential form – Conjugate prior is chosen to be Gaussian p( w )=N( w | m 0 ,S 0 ) with mean m 0 and covariance S 0 • Posterior is a Gaussian p( w | t )=N( w | m N ,S N ) where m N =S N (S 0 -1 m 0 + βΦ T t ) and S N -1 =S 0 -1 + T Φ Design matrix

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Machine Learning Srihari Hyper-parameters • There are two hyper-parameters in this Bayesian treatment β is precision of noise α is precision of weights • Zero mean isotropic Gaussian 6 p ( w | α ) = N ( w | 0 , 1 I ) Single precision parameter
Machine Learning Srihari

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 20

EvidenceApproximation - Machine Learning Srihari Evidence...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online