Stat841f09 - Wiki Course Notes

# Regression is performed on the weights and output by

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: the optimal solution in every case, but is simple to implement. To be more specific, random values near zero will be a good choice for the initial weights(usually from [- 1,1]). In this case, the model evolves from a nearly linear one to a nonlinear one as we desired. An alternative is to use an orthogonal least squares method to find the initial weights [11]. Regression is performed on the weights and output by using a linear approximation of , and finds optimal weights in the linear model. Back propagation is used afterward to find the optimal solution, since the NN is non- linear. Why all initial weights should be randomized and small? Since the error back propagated through the network is proportional to the value of the weights. If all the weights are the same, then the back propagated errors will be the same as well and causing all of the weights will be updated by the same amount. Thus, same initial weights will prevent the network from learning. Since the weights updates in the Back Prop algorithm are proportional to the derivative of activation function, it is important to consider how the net input affects its value. The derivative is a maximum when the activation function is equal to 0.5 and approaches its minimum as the activation function approaches 0 or 1, then its associated weights will vary very little. Thus, if we choose small initial weights, we will have the activation function close to the maximal weight change. How to s e t le arning rate s The learning rate is usually a constant. If we use On- line learning, as a form of stochastic approximation process, should decrease as the iteration increase. In typical feedforwad NNs with hidden units, the objective function has many local and global optimal values, so the optimal learning rate often changes dramatically during the training process. The larger the learning rate the larger the the weight changes on each epoch, and the quicker the network learns.However, the size of the learning rate can also influence whether the...
View Full Document

## This document was uploaded on 03/07/2014.

Ask a homework question - tutors are online