{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Chap5.2-Training

# Chap5.2-Training - Machine Learning Srihari Neural Network...

This preview shows pages 1–7. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Machine Learning Srihari Neural Network Training Sargur Srihari Machine Learning Srihari Topics • Neural network parameters • Probabilistic problem formulation • Determining the error function • Regression • Binary classiFcation • Multi-class classiFcation • Parameter optimization • Local quadratic approximation • Use of gradient optimization • Gradient descent optimization 2 Machine Learning Srihari Neural Network parameters • Linear models for regression and classiFcation can be represented as • which are linear combinations of basis functions • In a neural network the basis functions depend on parameters • During training allow these parameters to be adjusted along with the coefFcients w i 3 y (x,w) = f w j φ j (x) j = 1 M ∑ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ φ j (x) φ j (x) Machine Learning Srihari Network Training: Sum of squared errors • Neural networks perform a transformation • vector x of input variables to vector y of output variables • To determine w , simple analogy with polynomial curve Ftting • minimize sum-of-squared errors function • Given set of input vectors { x n }, n=1,..,N and target vectors {t n } minimize the error function • Consider a more general probabilistic interpretation 4 E (w) = 1 2 || y(x n ,w) − t n || 2 n = 1 N ∑ y k (x,w) = σ w kj (2) j = 1 M ∑ h w ji (1) i = 1 D ∑ x i ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ D input variables M hidden units N training vectors Machine Learning Srihari Probabilistic View: From activation function f determine Error Function E 1. Regression • f: activation function is identity • E : Sum-of-squares error/Maximum Likelihood 2. (Multiple Independent) Binary Classi¡cations • f: activation function is Logistic sigmoid • E: Cross-entropy error function 3. Multiclass Classi¡cation • f: Softmax outputs • E: Cross-entropy error function 5 E (w) = − t n ln y n + (1 − t n )ln(1 − y n ) { } n = 1 N ∑ E (w) = 1 2 y (x n ,w) − t n { } n = 1 N ∑ 2 y (x,w) = σ (w T φ (x)) = 1 1 + exp( − w T φ (x)) y (x,w) = w T φ (x) = w j φ j (x) j = 1 M ∑ E (w) = − t kn k = 1 K ∑ n = 1 N ∑ ln y k ( x n ,w) y k (x,w) = exp(w k φ (x)) exp(w j φ (x)) j ∑ Machine Learning Srihari 1. Probabilistic View: Regression 6 • Output is a single target variable t that can take any real value • Assuming t is Gaussian distributed with an x-dependent mean • Likelihood function • Taking negative logarithm, we get the negative log-likelihood • which can be used to learn parameters w and β p ( t | x,w) = N ( t | y (x,w), β − 1 ) p (t | x,w, β ) = n = 1 N ∏...
View Full Document

{[ snackBarMessage ]}

### Page1 / 23

Chap5.2-Training - Machine Learning Srihari Neural Network...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online