CSC 411 / CSC D11
Gradient Descent
9
Gradient Descent
There are many situations in which we wish to minimize an objective function with respect to a
parameter vector:
w
*
= arg min
w
E
(
w
)
(1)
but no closed-form solution for the minimum exists.
In machine learning, this optimization is
normally a data-fitting objective function, but similar problems arise throughout computer science,
numerical analysis, physics, finance, and many other fields.
The solution we will use in this course is called
gradient descent
. It works for any differen-
tiable energy function. However, it does not come with many guarantees: it is only guaranteed to
find a local minima in the limit of infinite computation time.
Gradient descent is iterative. First, we obtain an initial estimate
w
1
of the unknown parameter
vector. How we obtain this vector depends on the problem; one approach is to randomly-sample
values for the parameters. Then, from this initial estimate, we note that the direction of steepest
descent from this point is to follow the negative gradient
−∇
E
of the objective function evaluated
at
w
1
. The gradient is defined as a vector of derivatives with respect to each of the parameters:
∇
E
≡
This
preview
has intentionally blurred sections.
Sign up to view the full version.

This is the end of the preview.
Sign up
to
access the rest of the document.
- Spring '10
- DavidFleet
- Machine Learning, Optimization, objective function, Gradient descent
-
Click to edit the document details