This preview shows page 1. Sign up to view the full content.
Unformatted text preview: s multiplied by a different weight value.
Sum up these weighted inputs and passed through the activation function which scales the output to a fixed range of values. The output of the limiter is then broadcast to all
of the neurons in the next layer i.e. we apply the input values to the inputs of the first layer, allow the signals to propagate through the network, and read the output values.
When we were talking about perceptrons, we applied a gradient descent algorithm for optimizing weights. Back propagation uses this idea of gradient descent to train a
neural network based on the chain rule in calculus.
Assume that the last output layer has only one unit, so we are working with a regression problem. Later we will see how this can be extended to more output layers and thus
turn into a classificaiton problem.
For simplicity, there is only 1 unit at the end and assume for the moment we are doing regression. wikicour senote.com/w/index.php?title= Stat841&pr intable= yes 38/74 10/09/2013 Stat841  Wiki Cour se Notes Note that we make a distinction between the input weights
Within each unit we have a function and hidden weights that takes input . (linear sum of previous level) and outputs . The are the inputs into the final output of the model
We can find the error of the neural network output by evaluating the squared difference between the true classification and the resulting classification output Firs t find de rivative of the mode l e rror with re s pe ct to output we ights Now we ne e d to find the de rivative of the mode l e rror with re s pe ct to hidde n we ights
Consider the following diagram that opens up the hidden layers of the neural network: i j are rev ersed!
Notice that the weighted sum on the output of the perceptrons at layer are the inputs into the perceptrons at layer and so on for all hidden layers. So, using the chain rule Note that a change in causes changes in all in the next layer on which the error is based, so we need to sum over i in the chain: Using the activation function So We can propagate the error calculated in the output back through the previous...
View
Full
Document
This document was uploaded on 03/07/2014.
 Winter '13

Click to edit the document details