This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Machine Learning Srihari Error Backpropagation Sargur Srihari 1 Machine Learning Srihari Topics • Neural Network Learning Problem • Need for computing derivatives of Error function • Forward propagation of activations • Backward propagation of errors • Statement of Backprop algorithm • Use of backprop in computing the Jacobian matrix 2 Machine Learning Srihari Neural Network Learning Problem • Goal is to learn the weights w from a labelled set of training samples • Learning procedure has two stages 1. Evaluate derivatives of error function with respect to weights w 1 ,..w T 2. Use derivatives to compute adjustments to weights 3 w ( τ + 1) = w ( τ ) −η∇ E n ( w ( τ ) ) T=(D+1)M+(M+1)K =M(D+K+1)+K Machine Learning Srihari Backpropagation Terminology • Goal: EfFcient technique for evaluating gradient of an error function E( w ) for a feed forward neural network • Backpropagation is term used for derivative computation only • In subsequent stage derivatives are used to make adjustments to weights • Achieved using a local message passing scheme • Information sent forwards and backwards alternately Machine Learning Srihari Overview of Backprop algorithm • Choose random weights for the network • Feed in an example and obtain a result • Calculate the error for each node (starting from the last stage and propagating the error backwards) • Update the weights • Repeat with other examples until the network converges on the target output • How to divide up the errors needs a little calculus 5 Machine Learning Srihari Wide use of Backpropagation • Can be applied to error function other than sum of squared errors • Used to evaluate other matrices such as Jacobian and Hessian matrices • Second stage of weight adjustment using calculated derivatives can be tackled using variety of optimization schemes substantially more powerful than gradient descent 6 Machine Learning Srihari Evaluation of Error Function Derivatives • Derivation of backpropagation algorithm for • Arbitrary feedforward topology • Arbitrary differentiable nonlinear activation function • Broad class of error functions • Error functions of practical interest are sums of errors associated with each training data point • We consider problem of evaluating • For n th term in the error function • Derivatives are wrt the weights w 1 ,..w T • Can be used directly for sequential optimization or accumulated over training set (for batch) 7 E (w) = E n n = 1 N ∑ (w) ∇ E n (w) Machine Learning...
View
Full Document
 Fall '09
 Derivative, Machine Learning, Jacobian matrix

Click to edit the document details