This preview shows page 1. Sign up to view the full content.
Unformatted text preview: train the model
times. This deficiency is even more obvious in leave one out cross validation, where we must train the model times, where is the number of data point in the data set.
Fortunately, when adding data points to the classifier is reversible, calculating the difference between two classifiers is computationally more efficient than calculating the two
classifiers separately. So, if the classifier on all the data points is known, we simply undo the changes from a data point
times to calculate the leave one out crossvalidation error rate.
How to decide the number of folds? For a large number of folds, the bias of the true error rate estimator will be small, the variance of it and the computing time will be large.
For a small number of folds, everything will be opposite. When the datasets is large, 3 fold cross validation will be enough, but if the datasets is very sparse we prefer to use
leave one out. Regularization for Neural Network — Weight Decay
Weight decay training is suggested as an implementation for achieving a robust neural network which is
insensitive to noise. Since the number of hidden layers in NN is usually decided by certain domain
knowledge, it may easily get into the problem of overfitting.
It can be seen from Figure 1 that when the weight is in the vicinity of zero, the operative part of the activation
function shows linear behavior. The NN then collapses to an approximately linear model. Note that a linear
model is the simplest model, we can avoid overfitting by constraining the weights to be small. This gives us a
hint to initialize the random weights to be close to zero.
Formally, we penalize nonlinear weights by adding a penalty term in the error function. Now the regularized
error function becomes:
, where
is the weights of the output layer; is the original error in back propagation; is the weights of the hidden layers. Usually, too large will make the weights
and
too small. We can use cross validation to estimate
.Another approach to choosing...
View
Full
Document
This document was uploaded on 03/07/2014.
 Winter '13

Click to edit the document details