This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 MachineLearning CS6375Fall 2010 a Neural Networks Reading: Section 20.5, R&N Section 4.4, Mitchell 2 Linear Regression •We have training data X = { x 1 k }, k =1, …, N with corresponding output Y = { y k }, k =1, …, N •We want to find the parameters that predict the output Y from the data X in a linear fashion: y k ≈ w o + w 1 x 1 k 3 Linear Regression •We have training data X = { x 1 k }, k =1, …, N with corresponding output Y = { y k }, k =1, …, N •We want to find the parameters that predict the output Y from the data X in a linear fashion: y k ≈ w o + w 1 x 1 k Notations: Superscript: Index of the data point in the training data set; k = k th training data point Subscript: Coordinate of the data point; x 1 k = coordinate 1 of data point k . 4 Linear Regression •It is convenient to define an additional “fake”attribute for the input data: x o = 1 •We want to find the parameters that predict the output Y from the data X in a linear fashion: y k ≈ w o x o k + w 1 x 1 k 5 More Convenient Notations •Vector of attributes for each training data point: x k = [ x o k , …, x M k ] •We seek a vector of parameters: w = [ w o , …, w M ] such that we have a linear relation between prediction Y and attributes X : 6 Neural Network: Linear Perceptron 7 Neural Network: Linear Perceptron Note: This input unit corresponds to the “fake”attribute x o = 1. Called the bias Output Unit Input Units Connection with weight Neural Network Learning problem: Adjust the connection weights so that the network generates the correct prediction on the training data. 8 A Perceptron: The General Case 9 CommonlyUsed Activation Functions 10 The Perceptron Training Algorithm But how do we update the weights? 11 Linear Regression: Gradient Descent •We seek a vector of parameters: w = [ w o , …, w M ] that minimizes the error between the prediction Y and and the data X : 12 Linear Regression: Gradient Descent •We seek a vector of parameters: w = [ w o ,.., w M ] that minimizes the error between the prediction Y and and the data X : δ k is the error between the input x and the prediction y at data point k . Graphically, it is the “vertical”distance between data point k and the prediction calculated by using the vector of linear parameters w . 13 Gradient Descent •The minimum of E is reached when the derivatives with respect to each of the parameters w i is zero: 14 Gradient Descent •The minimum of E is reached when the derivatives with respect to each of the parameters w i is zero: Note that the contribution of training data element number k to the overall gradient is  δ k x i k 15 •Update rule: Move in the direction opposite to the gradient direction Gradient Descent Update Rule 16 Perceptron Training •Given input training data x k with corresponding value y k 1. Compute error: 2. Update NN weights: 17...
View
Full Document
 Fall '10
 VicentNg
 Neural Networks, Artificial neural network, neural network, Gradient descent, training data

Click to edit the document details