CSC 411 / CSC D11
Gradient Descent
9
Gradient Descent
There are many situations in which we wish to minimize an objective function with respect to a
parameter vector:
w
*
= arg min
w
E
(
w
)
(1)
but no closedform solution for the minimum exists.
In machine learning, this optimization is
normally a datafitting objective function, but similar problems arise throughout computer science,
numerical analysis, physics, finance, and many other fields.
The solution we will use in this course is called
gradient descent
. It works for any differen
tiable energy function. However, it does not come with many guarantees: it is only guaranteed to
find a local minima in the limit of infinite computation time.
Gradient descent is iterative. First, we obtain an initial estimate
w
1
of the unknown parameter
vector. How we obtain this vector depends on the problem; one approach is to randomlysample
values for the parameters. Then, from this initial estimate, we note that the direction of steepest
descent from this point is to follow the negative gradient
−∇
E
of the objective function evaluated
at
w
1
. The gradient is defined as a vector of derivatives with respect to each of the parameters:
∇
E
≡
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 DavidFleet
 Machine Learning, Optimization, objective function, Gradient descent

Click to edit the document details