Chap5.5-Regularization

Chap5.5-Regularization - Machine Learning Srihari...

Info icon This preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Machine Learning Srihari Regularization in Neural Networks Sargur Srihari 1
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Machine Learning Srihari Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function Linear Transformations and Consistent Gaussian priors 3. Early stopping Invariances Tangent propagation Training with transformed data Convolutional networks Soft weight sharing 2
Image of page 2
Machine Learning Srihari What is Regularization? In machine learning (also, statistics and inverse problems): introducing additional information to prevent over-fitting (or solve ill-posed problem) This information is usually a penalty for complexity, e.g., restrictions for smoothness bounds on the vector space norm Theoretical justification for regularization: attempts to impose Occam's razor on the solution From a Bayesian point of view Regularization corresponds to imposition of prior distributions on model parameters 3
Image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Machine Learning Srihari 1. Regularization by determining no. of hidden units Number of input and output units is determined by dimensionality of data set Number of hidden units M is a free parameter Adjusted to get best predictive performance Possible approach is to get maximum likelihood estimate of M for balance between under-fitting and over-fitting 4
Image of page 4
Machine Learning Srihari Effect of Varying Number of Hidden Units Sinusoidal Regression Problem Two layer network trained on 10 data points M = 1, 3 and 10 hidden units Minimizing sum-of-squared error function Using conjugate gradient descent Generalization error is not a simple function of M due to presence of local minima in error function
Image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Machine Learning Srihari Using Validation Set to determine no of hidden units 6 Number of hidden units , M Sum of squares Test error for polynomial data 30 random starts for each M Overall best validation Set performance happened at M=8 Plot a graph choosing random starts and different numbers of hidden units M
Image of page 6
Machine Learning Srihari 2. Regularization using Simple Weight Decay Generalization error is not a simple function of M Due to presence of local minima Need to control network complexity to avoid over-fitting Choose a relatively large M and control complexity by addition of regularization term Simplest regularizer is weight decay Effective model complexity determined by choice of regularization coefficient λ Regularizer is equivalent to a zero mean Gaussian prior over weight vector w Simple weight decay has certain shortcomings 7 ˜ E (w) = E (w) + λ 2 w T w
Image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Machine Learning Srihari Consistent Gaussian priors
Image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern