ann-4up - 1 MachineLearning CS6375---Fall 2010 a Neural...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 MachineLearning CS6375---Fall 2010 a Neural Networks Reading: Section 20.5, R&N Section 4.4, Mitchell 2 Linear Regression We have training data X = { x 1 k }, k =1, , N with corresponding output Y = { y k }, k =1, , N We want to find the parameters that predict the output Y from the data X in a linear fashion: y k w o + w 1 x 1 k 3 Linear Regression We have training data X = { x 1 k }, k =1, , N with corresponding output Y = { y k }, k =1, , N We want to find the parameters that predict the output Y from the data X in a linear fashion: y k w o + w 1 x 1 k Notations: Superscript: Index of the data point in the training data set; k = k th training data point Subscript: Coordinate of the data point; x 1 k = coordinate 1 of data point k . 4 Linear Regression It is convenient to define an additional fakeattribute for the input data: x o = 1 We want to find the parameters that predict the output Y from the data X in a linear fashion: y k w o x o k + w 1 x 1 k 5 More Convenient Notations Vector of attributes for each training data point: x k = [ x o k , , x M k ] We seek a vector of parameters: w = [ w o , , w M ] such that we have a linear relation between prediction Y and attributes X : 6 Neural Network: Linear Perceptron 7 Neural Network: Linear Perceptron Note: This input unit corresponds to the fakeattribute x o = 1. Called the bias Output Unit Input Units Connection with weight Neural Network Learning problem: Adjust the connection weights so that the network generates the correct prediction on the training data. 8 A Perceptron: The General Case 9 Commonly-Used Activation Functions 10 The Perceptron Training Algorithm But how do we update the weights? 11 Linear Regression: Gradient Descent We seek a vector of parameters: w = [ w o , , w M ] that minimizes the error between the prediction Y and and the data X : 12 Linear Regression: Gradient Descent We seek a vector of parameters: w = [ w o ,.., w M ] that minimizes the error between the prediction Y and and the data X : k is the error between the input x and the prediction y at data point k . Graphically, it is the verticaldistance between data point k and the prediction calculated by using the vector of linear parameters w . 13 Gradient Descent The minimum of E is reached when the derivatives with respect to each of the parameters w i is zero: 14 Gradient Descent The minimum of E is reached when the derivatives with respect to each of the parameters w i is zero: Note that the contribution of training data element number k to the overall gradient is - k x i k 15 Update rule: Move in the direction opposite to the gradient direction Gradient Descent Update Rule 16 Perceptron Training Given input training data x k with corresponding value y k 1. Compute error: 2. Update NN weights: 17...
View Full Document

Page1 / 13

ann-4up - 1 MachineLearning CS6375---Fall 2010 a Neural...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online