When we were talking about perceptrons we applied a

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: s multiplied by a different weight value. Sum up these weighted inputs and passed through the activation function which scales the output to a fixed range of values. The output of the limiter is then broadcast to all of the neurons in the next layer i.e. we apply the input values to the inputs of the first layer, allow the signals to propagate through the network, and read the output values. When we were talking about perceptrons, we applied a gradient descent algorithm for optimizing weights. Back- propagation uses this idea of gradient descent to train a neural network based on the chain rule in calculus. Assume that the last output layer has only one unit, so we are working with a regression problem. Later we will see how this can be extended to more output layers and thus turn into a classificaiton problem. For simplicity, there is only 1 unit at the end and assume for the moment we are doing regression. wikicour senote.com/w/index.php?title= Stat841&pr intable= yes 38/74 10/09/2013 Stat841 - Wiki Cour se Notes Note that we make a distinction between the input weights Within each unit we have a function and hidden weights that takes input . (linear sum of previous level) and outputs . The are the inputs into the final output of the model We can find the error of the neural network output by evaluating the squared difference between the true classification and the resulting classification output Firs t find de rivative of the mode l e rror with re s pe ct to output we ights Now we ne e d to find the de rivative of the mode l e rror with re s pe ct to hidde n we ights Consider the following diagram that opens up the hidden layers of the neural network: i j are rev ersed! Notice that the weighted sum on the output of the perceptrons at layer are the inputs into the perceptrons at layer and so on for all hidden layers. So, using the chain rule Note that a change in causes changes in all in the next layer on which the error is based, so we need to sum over i in the chain: Using the activation function So We can propagate the error calculated in the output back through the previous...
View Full Document

This document was uploaded on 03/07/2014.

Ask a homework question - tutors are online