Chap5.2-Training

Chap5.2-Training - Machine Learning Srihari Neural Network...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Machine Learning Srihari Neural Network Training Sargur Srihari Machine Learning Srihari Topics Neural network parameters Probabilistic problem formulation Determining the error function Regression Binary classiFcation Multi-class classiFcation Parameter optimization Local quadratic approximation Use of gradient optimization Gradient descent optimization 2 Machine Learning Srihari Neural Network parameters Linear models for regression and classiFcation can be represented as which are linear combinations of basis functions In a neural network the basis functions depend on parameters During training allow these parameters to be adjusted along with the coefFcients w i 3 y (x,w) = f w j j (x) j = 1 M j (x) j (x) Machine Learning Srihari Network Training: Sum of squared errors Neural networks perform a transformation vector x of input variables to vector y of output variables To determine w , simple analogy with polynomial curve Ftting minimize sum-of-squared errors function Given set of input vectors { x n }, n=1,..,N and target vectors {t n } minimize the error function Consider a more general probabilistic interpretation 4 E (w) = 1 2 || y(x n ,w) t n || 2 n = 1 N y k (x,w) = w kj (2) j = 1 M h w ji (1) i = 1 D x i D input variables M hidden units N training vectors Machine Learning Srihari Probabilistic View: From activation function f determine Error Function E 1. Regression f: activation function is identity E : Sum-of-squares error/Maximum Likelihood 2. (Multiple Independent) Binary Classications f: activation function is Logistic sigmoid E: Cross-entropy error function 3. Multiclass Classication f: Softmax outputs E: Cross-entropy error function 5 E (w) = t n ln y n + (1 t n )ln(1 y n ) { } n = 1 N E (w) = 1 2 y (x n ,w) t n { } n = 1 N 2 y (x,w) = (w T (x)) = 1 1 + exp( w T (x)) y (x,w) = w T (x) = w j j (x) j = 1 M E (w) = t kn k = 1 K n = 1 N ln y k ( x n ,w) y k (x,w) = exp(w k (x)) exp(w j (x)) j Machine Learning Srihari 1. Probabilistic View: Regression 6 Output is a single target variable t that can take any real value Assuming t is Gaussian distributed with an x-dependent mean Likelihood function Taking negative logarithm, we get the negative log-likelihood which can be used to learn parameters w and p ( t | x,w) = N ( t | y (x,w), 1 ) p (t | x,w, ) = n = 1 N...
View Full Document

This document was uploaded on 02/25/2012.

Page1 / 23

Chap5.2-Training - Machine Learning Srihari Neural Network...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online