GradientDescent - CSC 411 / CSC D11 Gradient Descent 9...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
CSC 411 / CSC D11 Gradient Descent 9 Gradient Descent There are many situations in which we wish to minimize an objective function with respect to a parameter vector: w * = arg min w E ( w ) (1) but no closed-form solution for the minimum exists. In machine learning, this optimization is normally a data-fitting objective function, but similar problems arise throughout computer science, numerical analysis, physics, finance, and many other fields. The solution we will use in this course is called gradient descent . It works for any differen- tiable energy function. However, it does not come with many guarantees: it is only guaranteed to find a local minima in the limit of infinite computation time. Gradient descent is iterative. First, we obtain an initial estimate w 1 of the unknown parameter vector. How we obtain this vector depends on the problem; one approach is to randomly-sample values for the parameters. Then, from this initial estimate, we note that the direction of steepest descent from this point is to follow the negative gradient −∇ E of the objective function evaluated at w 1 . The gradient is defined as a vector of derivatives with respect to each of the parameters:
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 3

GradientDescent - CSC 411 / CSC D11 Gradient Descent 9...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online