# ann-4up - 1 MachineLearning CS6375-Fall 2010 a Neural...

This preview shows pages 1–5. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 MachineLearning CS6375---Fall 2010 a Neural Networks Reading: Section 20.5, R&N Section 4.4, Mitchell 2 Linear Regression •We have training data X = { x 1 k }, k =1, …, N with corresponding output Y = { y k }, k =1, …, N •We want to find the parameters that predict the output Y from the data X in a linear fashion: y k ≈ w o + w 1 x 1 k 3 Linear Regression •We have training data X = { x 1 k }, k =1, …, N with corresponding output Y = { y k }, k =1, …, N •We want to find the parameters that predict the output Y from the data X in a linear fashion: y k ≈ w o + w 1 x 1 k Notations: Superscript: Index of the data point in the training data set; k = k th training data point Subscript: Coordinate of the data point; x 1 k = coordinate 1 of data point k . 4 Linear Regression •It is convenient to define an additional “fake”attribute for the input data: x o = 1 •We want to find the parameters that predict the output Y from the data X in a linear fashion: y k ≈ w o x o k + w 1 x 1 k 5 More Convenient Notations •Vector of attributes for each training data point: x k = [ x o k , …, x M k ] •We seek a vector of parameters: w = [ w o , …, w M ] such that we have a linear relation between prediction Y and attributes X : 6 Neural Network: Linear Perceptron 7 Neural Network: Linear Perceptron Note: This input unit corresponds to the “fake”attribute x o = 1. Called the bias Output Unit Input Units Connection with weight Neural Network Learning problem: Adjust the connection weights so that the network generates the correct prediction on the training data. 8 A Perceptron: The General Case 9 Commonly-Used Activation Functions 10 The Perceptron Training Algorithm But how do we update the weights? 11 Linear Regression: Gradient Descent •We seek a vector of parameters: w = [ w o , …, w M ] that minimizes the error between the prediction Y and and the data X : 12 Linear Regression: Gradient Descent •We seek a vector of parameters: w = [ w o ,.., w M ] that minimizes the error between the prediction Y and and the data X : δ k is the error between the input x and the prediction y at data point k . Graphically, it is the “vertical”distance between data point k and the prediction calculated by using the vector of linear parameters w . 13 Gradient Descent •The minimum of E is reached when the derivatives with respect to each of the parameters w i is zero: 14 Gradient Descent •The minimum of E is reached when the derivatives with respect to each of the parameters w i is zero: Note that the contribution of training data element number k to the overall gradient is - δ k x i k 15 •Update rule: Move in the direction opposite to the gradient direction Gradient Descent Update Rule 16 Perceptron Training •Given input training data x k with corresponding value y k 1. Compute error: 2. Update NN weights: 17...
View Full Document

{[ snackBarMessage ]}

### Page1 / 13

ann-4up - 1 MachineLearning CS6375-Fall 2010 a Neural...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online