How to decide the number of folds for a large number

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: train the model times. This deficiency is even more obvious in leave- one- out cross- validation, where we must train the model times, where is the number of data point in the data set. Fortunately, when adding data points to the classifier is reversible, calculating the difference between two classifiers is computationally more efficient than calculating the two classifiers separately. So, if the classifier on all the data points is known, we simply undo the changes from a data point times to calculate the leave- one- out crossvalidation error rate. How to decide the number of folds? For a large number of folds, the bias of the true error rate estimator will be small, the variance of it and the computing time will be large. For a small number of folds, everything will be opposite. When the datasets is large, 3- fold cross validation will be enough, but if the datasets is very sparse we prefer to use leave- one- out. Regularization for Neural Network — Weight Decay Weight decay training is suggested as an implementation for achieving a robust neural network which is insensitive to noise. Since the number of hidden layers in NN is usually decided by certain domain knowledge, it may easily get into the problem of overfitting. It can be seen from Figure 1 that when the weight is in the vicinity of zero, the operative part of the activation function shows linear behavior. The NN then collapses to an approximately linear model. Note that a linear model is the simplest model, we can avoid overfitting by constraining the weights to be small. This gives us a hint to initialize the random weights to be close to zero. Formally, we penalize nonlinear weights by adding a penalty term in the error function. Now the regularized error function becomes: , where is the weights of the output layer; is the original error in back- propagation; is the weights of the hidden layers. Usually, too large will make the weights and too small. We can use cross validation to estimate .Another approach to choosing...
View Full Document

This document was uploaded on 03/07/2014.

Ask a homework question - tutors are online