Chap5.5-Regularization

Chap5.5-Regularization - Machine Learning Srihari...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Machine Learning Srihari Regularization in Neural Networks Sargur Srihari 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Machine Learning Srihari Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function Linear Transformations and Consistent Gaussian priors 3. Early stopping Invariances Tangent propagation Training with transformed data Convolutional networks Soft weight sharing 2
Background image of page 2
Machine Learning Srihari What is Regularization? In machine learning (also, statistics and inverse problems): introducing additional information to prevent over-Ftting (or solve ill-posed problem) This information is usually a penalty for complexity, e.g., restrictions for smoothness bounds on the vector space norm Theoretical justiFcation for regularization: attempts to impose Occam's razor on the solution ±rom a Bayesian point of view Regularization corresponds to imposition of prior distributions on model parameters 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Machine Learning Srihari 1. Regularization by determining no. of hidden units Number of input and output units is determined by dimensionality of data set Number of hidden units M is a free parameter Adjusted to get best predictive performance Possible approach is to get maximum likelihood estimate of M for balance between under-Ftting and over-Ftting 4
Background image of page 4
Machine Learning Srihari Effect of Varying Number of Hidden Units Sinusoidal Regression Problem Two layer network trained on 10 data points M = 1, 3 and 10 hidden units Minimizing sum-of-squared error function Using conjugate gradient descent Generalization error is not a simple function of M due to presence of local minima in error function
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Machine Learning Srihari Using Validation Set to determine no of hidden units 6 Number of hidden units , M Sum of squares Test error for polynomial data 30 random starts for each M Overall best validation Set performance happened at M=8 Plot a graph choosing random starts and different numbers of hidden units M
Background image of page 6
Machine Learning Srihari 2. Regularization using Simple Weight Decay Generalization error is not a simple function of M Due to presence of local minima Need to control network complexity to avoid over-Ftting Choose a relatively large M and control complexity by addition of regularization term Simplest regularizer is weight decay Effective model complexity determined by choice of regularization coefFcient λ Regularizer is equivalent to a zero mean Gaussian prior over weight vector w Simple weight decay has certain shortcomings 7 ˜ E (w) = E (w) + λ 2 w T w
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Machine Learning Srihari Consistent Gaussian priors
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 02/25/2012.

Page1 / 24

Chap5.5-Regularization - Machine Learning Srihari...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online