*This preview shows
pages
1–7. Sign up
to
view the full content.*

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **Machine Learning Srihari Bayesian Linear Regression Sargur Srihari srihari@cedar.buffalo.edu 1 Machine Learning Srihari Motivation In maximum likelihood model complexity (number of basis functions) needs to be controlled according to the size of the data set Adding regularization term helps control model complexity Number and choice of basis functions is still important Bayesian treatment avoids this problem 2 Machine Learning Srihari 3 Parameter Distribution Prior probability distribution p( w ) over model parameters w Noise precision parameter assumed known Since Likelihood function p(t| w ) with Gaussian noise has an exponential form Conjugate prior is given by Gaussian p( w )=N( w | m ,S ) with mean m and covariance S 0 Machine Learning Srihari 4 Posterior Distribution of Parameters Given by product of likelihood function and prior p( w |D) =p(D| w )p( w )/p(D) Due to choice of conjugate Gaussian prior, posterior is also Gaussian Posterior can be written directly in the form p( w | t )=N( w | m N ,S N ) where m N =S N (S-1 m + T t ), and S N-1 =S-1 + T Machine Learning Srihari Gaussian Prior for Linear Regression Zero mean isotropic Gaussian Corresponding posterior distribution is p( w | t )=N( w | m N ,S N ) where m N = S N T t and S N-1 = I+ T 5 p ( w | ) = N ( w | , 1 I ) Single precision parameter Note: is noise precision and is distribution of parameter w Machine Learning Srihari Log Posterior Distribution 6 ln p ( w | t ) =...

View
Full
Document