This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Outline
Scientiﬁc Computing: An Introductory Survey
Chapter 6 – Optimization 1 2 OneDimensional Optimization 3 Prof. Michael T. Heath Optimization Problems MultiDimensional Optimization Department of Computer Science
University of Illinois at UrbanaChampaign Copyright c 2002. Reproduction permitted
for noncommercial, educational use only. Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing Scientiﬁc Computing Michael T. Heath 1 / 74 Deﬁnitions
Existence and Uniqueness
Optimality Conditions Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Optimization 2 / 74 Deﬁnitions
Existence and Uniqueness
Optimality Conditions Optimization Problems Given function f : Rn → R, and set S ⊆ Rn , ﬁnd x∗ ∈ S
such that f (x∗ ) ≤ f (x) for all x ∈ S
General continuous optimization problem: x∗ is called minimizer or minimum of f min f (x) subject to g (x) = 0 and h(x) ≤ 0 It sufﬁces to consider only minimization, since maximum of
f is minimum of −f where f : Objective function f is usually differentiable, and may be
linear or nonlinear Linear programming : f , g , and h are all linear Constraint set S is deﬁned by system of equations and
inequalities, which may be linear or nonlinear Rn → R, g : Rn → Rm , h : Rn → R p Nonlinear programming : at least one of f , g , and h is
nonlinear Points x ∈ S are called feasible points
If S = Rn , problem is unconstrained
Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 3 / 74 Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Deﬁnitions
Existence and Uniqueness
Optimality Conditions Examples: Optimization Problems Scientiﬁc Computing 4 / 74 Deﬁnitions
Existence and Uniqueness
Optimality Conditions Local vs Global Optimization Minimize weight of structure subject to constraint on its
strength, or maximize its strength subject to constraint on
its weight x∗ ∈ S is global minimum if f (x∗ ) ≤ f (x) for all x ∈ S
x∗ ∈ S is local minimum if f (x∗ ) ≤ f (x) for all feasible x in
some neighborhood of x∗ Minimize cost of diet subject to nutritional constraints
Minimize surface area of cylinder subject to constraint on
its volume:
min f (x1 , x2 ) = 2πx1 (x1 + x2 ) x1 ,x2 subject to g (x1 , x2 ) = πx2 x2 − V = 0
1 where x1 and x2 are radius and height of cylinder, and V is
required volume
Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 5 / 74 Deﬁnitions
Existence and Uniqueness
Optimality Conditions Scientiﬁc Computing 6 / 74 Deﬁnitions
Existence and Uniqueness
Optimality Conditions Existence of Minimum Global Optimization If f is continuous on closed and bounded set S ⊆ Rn , then
f has global minimum on S Finding, or even verifying, global minimum is difﬁcult, in
general If S is not closed or is unbounded, then f may have no
local or global minimum on S Most optimization methods are designed to ﬁnd local
minimum, which may or may not be global minimum Continuous function f on unbounded set S ⊆ Rn is
coercive if
lim f (x) = +∞ If global minimum is desired, one can try several widely
separated starting points and see if all produce same
result x →∞ i.e., f (x) must be large whenever x is large For some problems, such as linear programming, global
optimization is more tractable Michael T. Heath Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing If f is coercive on closed, unbounded set S ⊆ Rn , then f
has global minimum on S
7 / 74 Michael T. Heath Scientiﬁc Computing 8 / 74 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Deﬁnitions
Existence and Uniqueness
Optimality Conditions Level Sets Deﬁnitions
Existence and Uniqueness
Optimality Conditions Uniqueness of Minimum Level set for function f : S ⊆ Rn → R is set of all points in
S for which f has some given constant value Set S ⊆ Rn is convex if it contains line segment between
any two of its points For given γ ∈ R, sublevel set is Function f : S ⊆ Rn → R is convex on convex set S if its
graph along any line segment in S lies on or below chord
connecting function values at endpoints of segment Lγ = {x ∈ S : f (x) ≤ γ }
If continuous function f on S ⊆ Rn has nonempty sublevel
set that is closed and bounded, then f has global minimum
on S Any local minimum of convex function f on convex set
S ⊆ Rn is global minimum of f on S
Any local minimum of strictly convex function f on convex
set S ⊆ Rn is unique global minimum of f on S If S is unbounded, then f is coercive on S if, and only if, all
of its sublevel sets are bounded Scientiﬁc Computing Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization 9 / 74 Michael T. Heath Deﬁnitions
Existence and Uniqueness
Optimality Conditions Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization FirstOrder Optimality Condition For twice continuously differentiable f : S ⊆ Rn → R, we
can distinguish among critical points by considering
Hessian matrix Hf (x) deﬁned by Generalization to function of n variables is to ﬁnd critical
point, i.e., solution of nonlinear system {Hf (x)}ij = f (x) = 0
where f (x) is gradient vector of f , whose ith component
is ∂f (x)/∂xi At critical point x∗ , if Hf (x∗ ) is
positive deﬁnite, then x∗ is minimum of f
negative deﬁnite, then x∗ is maximum of f
indeﬁnite, then x∗ is saddle point of f
singular, then various pathological situations are possible But not all critical points are minima: they can also be
maxima or saddle points
Scientiﬁc Computing Michael T. Heath 11 / 74 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Deﬁnitions
Existence and Uniqueness
Optimality Conditions Constrained Optimality Scientiﬁc Computing 12 / 74 Deﬁnitions
Existence and Uniqueness
Optimality Conditions Constrained Optimality, continued If problem is constrained, only feasible directions are
relevant Lagrangian function L : Rn+m → R, is deﬁned by
L(x, λ) = f (x) + λT g (x) For equalityconstrained problem Its gradient is given by min f (x) subject to g (x) = 0
Rn Rn Rm , where f :
→ R and g :
→
with m ≤ n, necessary
condition for feasible point x∗ to be solution is that negative
gradient of f lie in space spanned by constraint normals,
− ∂ 2 f (x)
∂xi ∂xj which is symmetric For continuously differentiable f : S ⊆ Rn → R, any interior
point x∗ of S at which f has local minimum must be critical
point of f Michael T. Heath 10 / 74 Deﬁnitions
Existence and Uniqueness
Optimality Conditions SecondOrder Optimality Condition For function of one variable, one can ﬁnd extremum by
differentiating function and setting derivative to zero Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing f (x∗ ) = L(x, λ) =
Its Hessian is given by T
Jg (x∗ )λ HL (x, λ) = where Jg is Jacobian matrix of g , and λ is vector of
Lagrange multipliers Michael T. Heath Scientiﬁc Computing T
B (x, λ) Jg (x)
Jg (x)
O where This condition says we cannot reduce objective function
without violating constraints Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization T
f (x) + Jg (x)λ
g (x) m B (x, λ) = Hf (x) + λi Hgi (x)
i=1 13 / 74 Deﬁnitions
Existence and Uniqueness
Optimality Conditions Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Constrained Optimality, continued Scientiﬁc Computing 14 / 74 Deﬁnitions
Existence and Uniqueness
Optimality Conditions Constrained Optimality, continued Together, necessary condition and feasibility imply critical
point of Lagrangian function,
L(x, λ) = T
f (x) + Jg (x)λ
=0
g (x) If inequalities are present, then KKT optimality conditions
also require nonnegativity of Lagrange multipliers
corresponding to inequalities, and complementarity
condition Hessian of Lagrangian is symmetric, but not positive
deﬁnite, so critical point of L is saddle point rather than
minimum or maximum
Critical point (x∗ , λ∗ ) of L is constrained minimum of f if
B (x∗ , λ∗ ) is positive deﬁnite on null space of Jg (x∗ )
If columns of Z form basis for null space, then test
projected Hessian Z T BZ for positive deﬁniteness
Michael T. Heath Scientiﬁc Computing 15 / 74 Michael T. Heath Scientiﬁc Computing 16 / 74 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Deﬁnitions
Existence and Uniqueness
Optimality Conditions Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Unimodality Sensitivity and Conditioning
Function minimization and equation solving are closely
related problems, but their sensitivities differ For minimizing function of one variable, we need “bracket”
for solution analogous to sign change for nonlinear
equation In one dimension, absolute condition number of root x∗ of
equation f (x) = 0 is 1/f (x∗ ), so if f (ˆ) ≤ , then
x
x − x∗  may be as large as /f (x∗ )
ˆ Realvalued function f is unimodal on interval [a, b] if there
is unique x∗ ∈ [a, b] such that f (x∗ ) is minimum of f on
[a, b], and f is strictly decreasing for x ≤ x∗ , strictly
increasing for x∗ ≤ x For minimizing f , Taylor series expansion
f (ˆ) = f (x∗ + h)
x
= f (x∗ ) + f (x∗ )h + 1 f (x∗ )h2 + O(h3 )
2
shows that, since f (x∗ ) = 0, if f (ˆ) − f (x∗ ) ≤ , then
x
x − x∗  may be as large as 2 /f (x∗ )
ˆ Unimodality enables discarding portions of interval based
on sample function values, analogous to interval bisection Thus, based on function values alone, minima can be
computed to only about half precision
Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 17 / 74 Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Golden Section Search Scientiﬁc Computing 18 / 74 Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Golden Section Search, continued Suppose f is unimodal on [a, b], and let x1 and x2 be two
points within [a, b], with x1 < x2 To accomplish this, we choose relative positions of two
points as τ and 1 − τ , where τ 2 = 1 − τ , so
√
τ = ( 5 − 1)/2 ≈ 0.618 and 1 − τ ≈ 0.382 Evaluating and comparing f (x1 ) and f (x2 ), we can discard
either (x2 , b] or [a, x1 ), with minimum known to lie in
remaining subinterval Whichever subinterval is retained, its length will be τ
relative to previous interval, and interior point retained will
be at position either τ or 1 − τ relative to new interval To repeat process, we need compute only one new
function evaluation To continue iteration, we need to compute only one new
function value, at complementary point To reduce length of interval by ﬁxed fraction at each
iteration, each new pair of points must have same
relationship with respect to new interval that previous pair
had with respect to previous interval This choice of sample points is called golden section
search Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing Golden section search is safe but convergence rate is only
linear, with constant C ≈ 0.618
19 / 74 Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Golden Section Search, continued Michael T. Heath 20 / 74 Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Use golden section search to minimize
f (x) = 0.5 − x exp(−x2 ) Scientiﬁc Computing 21 / 74 Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Example, continued
x1
0.764
0.472
0.764
0.652
0.584
0.652
0.695
0.679
0.695
0.705 Scientiﬁc Computing Example: Golden Section Search √
τ = ( 5 − 1)/2
x1 = a + (1 − τ )(b − a); f1 = f (x1 )
x2 = a + τ (b − a); f2 = f (x2 )
while ((b − a) > tol) do
if (f1 > f2 ) then
a = x1
x1 = x2
f1 = f2
x2 = a + τ (b − a)
f2 = f (x2 )
else
b = x2
x2 = x1
f2 = f1
x1 = a + (1 − τ )(b − a)
f1 = f (x1 )
end
end
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 22 / 74 Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Successive Parabolic Interpolation
f1
0.074
0.122
0.074
0.074
0.085
0.074
0.071
0.072
0.071
0.071 x2
1.236
0.764
0.944
0.764
0.652
0.695
0.721
0.695
0.705
0.711 Fit quadratic polynomial to three function values
Take minimum of quadratic to be new approximation to
minimum of function f2
0.232
0.074
0.113
0.074
0.074
0.071
0.071
0.071
0.071
0.071 New point replaces oldest of three previous points and
process is repeated until convergence
Convergence rate of successive parabolic interpolation is
superlinear, with r ≈ 1.324 < interactive example >
Michael T. Heath Scientiﬁc Computing 23 / 74 Michael T. Heath Scientiﬁc Computing 24 / 74 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Example: Successive Parabolic Interpolation Example, continued Use successive parabolic interpolation to minimize
f (x) = 0.5 − x exp(−x2 ) xk
0.000
0.600
1.200
0.754
0.721
0.692
0.707 f (xk )
0.500
0.081
0.216
0.073
0.071
0.071
0.071 < interactive example > Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 25 / 74 Scientiﬁc Computing Michael T. Heath Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Newton’s Method 26 / 74 Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Example: Newton’s Method
Use Newton’s method to minimize f (x) = 0.5 − x exp(−x2 ) Another local quadratic approximation is truncated Taylor
series
f (x) 2
f (x + h) ≈ f (x) + f (x)h +
h
2 First and second derivatives of f are given by
f (x) = (2x2 − 1) exp(−x2 )
and By differentiation, minimum of this quadratic function of h is
given by h = −f (x)/f (x) f (x) = 2x(3 − 2x2 ) exp(−x2 )
Newton iteration for zero of f is given by Suggests iteration scheme xk+1 = xk − (2x2 − 1)/(2xk (3 − 2x2 ))
k
k xk+1 = xk − f (xk )/f (xk ) Using starting guess x0 = 1, we obtain which is Newton’s method for solving nonlinear equation
f (x) = 0 xk
1.000
0.500
0.700
0.707 Newton’s method for ﬁnding minimum normally has
quadratic convergence rate, but must be started close
enough to solution to converge < interactive example >
Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 27 / 74 Golden Section Search
Successive Parabolic Interpolation
Newton’s Method Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Safeguarded Methods f (xk )
0.132
0.111
0.071
0.071
Scientiﬁc Computing 28 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Direct Search Methods
Direct search methods for multidimensional optimization
make no use of function values other than comparing them As with nonlinear equations in one dimension,
slowbutsure and fastbutrisky optimization methods can
be combined to provide both safety and efﬁciency For minimizing function f of n variables, NelderMead
method begins with n + 1 starting points, forming simplex
in Rn Most library routines for onedimensional optimization are
based on this hybrid approach Then move to new point along straight line from current
point having highest function value through centroid of
other points Popular combination is golden section search and
successive parabolic interpolation, for which no derivatives
are required New point replaces worst point, and process is repeated
Direct search methods are useful for nonsmooth functions
or for small n, but expensive for larger n
< interactive example > Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 29 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Steepest Descent Method Scientiﬁc Computing 30 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Steepest Descent, continued Let f : Rn → R be realvalued function of n real variables Given descent direction, such as negative gradient,
determining appropriate value for αk at each iteration is
onedimensional minimization problem At any point x where gradient vector is nonzero, negative
gradient, − f (x), points downhill toward lower values of f min f (xk − αk f (xk ))
αk In fact, − f (x) is locally direction of steepest descent: f
decreases more rapidly along direction of negative
gradient than along any other that can be solved by methods already discussed
Steepest descent method is very reliable: it can always
make progress provided gradient is nonzero Steepest descent method: starting from initial guess x0 ,
successive approximate solutions given by But method is myopic in its view of function’s behavior, and
resulting iterates can zigzag back and forth, making very
slow progress toward solution xk+1 = xk − αk f (xk )
where αk is line search parameter that determines how far
to go in given direction
Michael T. Heath Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing In general, convergence rate of steepest descent is only
linear, with constant factor that can be arbitrarily close to 1
31 / 74 Michael T. Heath Scientiﬁc Computing 32 / 74 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Example: Steepest Descent Example, continued Use steepest descent method to minimize
xk f (x) = 0.5x2 + 2.5x2
1
2
Gradient is given by
Taking x0 = f (x) = 5
, we have
1 5.000
3.333
2.222
1.481
0.988
0.658
0.439
0.293
0.195
0.130 x1
5x2 f (x0 ) = 5
5 Performing line search along negative gradient direction,
min f (x0 − α0 f (x0 ))
α0 exact minimum along line is given by α0 = 1/3, so next
3.333
approximation is x1 =
−0.667
Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 1.000
−0.667
0.444
−0.296
0.198
−0.132
0.088
−0.059
0.039
−0.026 33 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization f (xk )
15.000
6.667
2.963
1.317
0.585
0.260
0.116
0.051
0.023
0.010 f (xk )
5.000
−3.333
2.222
−1.481
0.988
−0.658
0.439
−0.293
0.195
−0.130 Scientiﬁc Computing Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Example, continued 5.000
3.333
2.222
1.481
0.988
0.658
0.439
0.293
0.195
0.130 34 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Newton’s Method
Broader view can be obtained by local quadratic
approximation, which is equivalent to Newton’s method
In multidimensional optimization, we seek zero of gradient,
so Newton iteration has form
−
xk+1 = xk − Hf 1 (xk ) f (xk ) where Hf (x) is Hessian matrix of second partial
derivatives of f ,
{Hf (x)}ij = ∂ 2 f (x)
∂xi ∂xj < interactive example >
Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 35 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Michael T. Heath Newton’s Method, continued Use Newton’s method to minimize
f (x) = 0.5x2 + 2.5x2
1
2 Hf (xk )sk = − f (xk ) Gradient and Hessian are given by for Newton step sk , then take as next iterate f (x) = xk+1 = xk + sk As usual, Newton’s method is unreliable unless started
close enough to solution to converge
< interactive example >
Scientiﬁc Computing 37 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization 5
, we have
1 10
05 Michael T. Heath f (x0 ) = Scientiﬁc Computing 38 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Newton’s Method, continued
If objective function f has continuous second partial
derivatives, then Hessian matrix Hf is symmetric, and
near minimum it is positive deﬁnite In principle, line search parameter is unnecessary with
Newton’s method, since quadratic model determines
length, as well as direction, of step to next approximate
solution Thus, linear system for step to next iterate can be solved in
only about half of work required for LU factorization
Far from minimum, Hf (xk ) may not be positive deﬁnite, so
Newton step sk may not be descent direction for function,
i.e., we may not have When started far from solution, however, it may still be
advisable to perform line search along direction of Newton
step sk to make method more robust (damped Newton) f (xk )T sk < 0 Once iterates are near solution, then αk = 1 should sufﬁce
for subsequent iterations Scientiﬁc Computing and Hf (x) = Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Newton’s Method, continued Michael T. Heath x1
5x2 5
5
10
−5
Linear system for Newton step is
s=
, so
05 0
−5
5
−5
0
x1 = x0 + s0 =
+
=
, which is exact solution
1
−1
0
for this problem, as expected for quadratic function
Taking x0 = Convergence rate of Newton’s method for minimization is
normally quadratic Michael T. Heath 36 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Example: Newton’s Method Do not explicitly invert Hessian matrix, but instead solve
linear system Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization In this case, alternative descent direction can be
computed, such as negative gradient or direction of
negative curvature, and then perform line search
39 / 74 Michael T. Heath Scientiﬁc Computing 40 / 74 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Trust Region Methods Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Trust Region Methods, continued Alternative to line search is trust region method, in which
approximate solution is constrained to lie within region
where quadratic model is sufﬁciently accurate
If current trust radius is binding, minimizing quadratic
model function subject to this constraint may modify
direction as well as length of Newton step
Accuracy of quadratic model is assessed by comparing
actual decrease in objective function with that predicted by
quadratic model, and trust radius is increased or
decreased accordingly Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 41 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization QuasiNewton Methods 42 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Could use Broyden’s method to seek zero of gradient, but
this would not preserve symmetry of Hessian matrix Many variants of Newton’s method improve reliability and
reduce overhead Several secant updating formulas have been developed for
minimization that not only preserve symmetry in
approximate Hessian matrix, but also preserve positive
deﬁniteness QuasiNewton methods have form
−
xk+1 = xk − αk Bk 1 f (xk ) where αk is line search parameter and Bk is approximation
to Hessian matrix Symmetry reduces amount of work required by about half,
while positive deﬁniteness guarantees that quasiNewton
step will be descent direction Many quasiNewton methods are more robust than
Newton’s method, are superlinearly convergent, and have
lower overhead per iteration, which often more than offsets
their slower convergence rate
Michael T. Heath Scientiﬁc Computing Secant Updating Methods Newton’s method costs O(n3 ) arithmetic and O(n2 ) scalar
function evaluations per iteration for dense problem Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 43 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization BFGS Method Scientiﬁc Computing 44 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization BFGS Method, continued
In practice, factorization of Bk is updated rather than Bk
itself, so linear system for sk can be solved at cost of O(n2 )
rather than O(n3 ) work One of most effective secant updating methods for minimization
is BFGS Unlike Newton’s method for minimization, no second
derivatives are required x0 = initial guess
B0 = initial Hessian approximation
for k = 0, 1, 2, . . .
Solve Bk sk = − f (xk ) for sk
xk+1 = xk + sk
yk = f (xk+1 ) − f (xk )
T
T
Bk+1 = Bk + (yk yk )/(yk sk ) − (Bk sk sT Bk )/(sT Bk sk )
k
k
end Can start with B0 = I , so initial step is along negative
gradient, and then second derivative information is
gradually built up in approximate Hessian matrix over
successive iterations
BFGS normally has superlinear convergence rate, even
though approximate Hessian does not necessarily
converge to true Hessian
Line search can be used to enhance effectiveness Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 45 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Example: BFGS Method T xk
5.000
1.000
0.000 −4.000
−2.222
0.444
0.816
0.082
−0.009 −0.015
−0.001
0.001 and B0 = I , initial step is negative x1 = x0 + s0 = 5
−5
0
+
=
1
−5
−4 46 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization f (xk )
5.000
5.000
0.000 −20.000
−2.222
2.222
0.816
0.408
−0.009
−0.077
−0.001
0.005 For quadratic objective function, BFGS with exact line
search ﬁnds exact solution in at most n iterations, where n
is dimension of problem
< interactive example > Then new step is computed and process is repeated
Scientiﬁc Computing f (xk )
15.000
40.000
2.963
0.350
0.001
0.000 Increase in function value can be avoided by using line
search, which generally enhances convergence Updating approximate Hessian using BFGS formula, we
obtain
0.667 0.333
B1 =
0.333 0.667 Michael T. Heath Scientiﬁc Computing Example: BFGS Method Use BFGS to minimize f (x) = 0.5x2 + 2.5x2
1
2
x1
Gradient is given by f (x) =
5x2
Taking x0 = 5 1
gradient, so Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization 47 / 74 Michael T. Heath Scientiﬁc Computing 48 / 74 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Conjugate Gradient Method Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Conjugate Gradient Method, continued
x0 = initial guess
g0 = f (x0 )
s0 = −g0
for k = 0, 1, 2, . . .
Choose αk to minimize f (xk + αk sk )
xk+1 = xk + αk sk
gk+1 = f (xk+1 )
T
T
βk+1 = (gk+1 gk+1 )/(gk gk )
sk+1 = −gk+1 + βk+1 sk
end Another method that does not require explicit second
derivatives, and does not even store approximation to
Hessian matrix, is conjugate gradient (CG) method
CG generates sequence of conjugate search directions,
implicitly accumulating information about Hessian matrix
For quadratic objective function, CG is theoretically exact
after at most n iterations, where n is dimension of problem
CG is effective for general unconstrained minimization as
well Alternative formula for βk+1 is
T
βk+1 = ((gk+1 − gk )T gk+1 )/(gk gk ) Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 49 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Example: Conjugate Gradient Method T At this point, however, rather than search along new
negative gradient, we compute instead
T
T
β1 = (g1 g1 )/(g0 g0 ) = 0.444 which gives as next search direction −5
−5 f (x1 ) = Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Minimum along this direction is given by α1 = 0.6, which
gives exact solution at origin, as expected for quadratic
function 3.333
−3.333 Scientiﬁc Computing < interactive example >
51 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization 52 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Given data (ti , yi ), ﬁnd vector x of parameters that gives
“best ﬁt” in least squares sense to model function f (t, x),
where f is nonlinear function of x Small number of iterations may sufﬁce to produce step as
useful as true Newton step, especially far from overall
solution, where true Newton step may be unreliable
anyway Deﬁne components of residual function
ri (x) = yi − f (ti , x),
so we want to minimize φ(x) = Good choice for linear iterative solver is CG method, which
gives step intermediate between steepest descent and
Newtonlike step Gradient vector is
is Scientiﬁc Computing i = 1, . . . , m 1T
2 r (x)r (x) φ(x) = J T (x)r (x) and Hessian matrix
m Hφ (x) = J T (x)J (x) + Since only matrixvector products are required, explicit
formation of Hessian matrix can be avoided by using ﬁnite
difference of gradient along given vector ri (x)Hi (x)
i=1 where J (x) is Jacobian of r (x), and Hi (x) is Hessian of
ri (x)
53 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Nonlinear Least Squares, continued Scientiﬁc Computing 54 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization GaussNewton Method
This motivates GaussNewton method for nonlinear least
squares, in which secondorder term is dropped and linear
system
J T (xk )J (xk )sk = −J T (xk )r (xk ) Linear system for Newton step is
m J T (xk )J (xk ) + Scientiﬁc Computing Nonlinear Least Squares Another way to reduce work in Newtonlike methods is to
solve linear system for Newton step by iterative method Michael T. Heath Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Truncated Newton Methods Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization −3.333
−5
−5.556
+ 0.444
=
3.333
−5
1.111 s1 = −g1 + β1 s0 = Exact minimum along line is given by α0 = 1/3, so next
T
approximation is x1 = 3.333 −0.667 , and we compute
new gradient,
g1 = 50 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization So far there is no difference from steepest descent method , initial search direction is negative s0 = −g0 = − f (x0 ) = Scientiﬁc Computing Example, continued Use CG method to minimize f (x) = 0.5x2 + 2.5x2
1
2
x1
Gradient is given by f (x) =
5x2
Taking x0 = 5 1
gradient, Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization ri (xk )Hi (xk ) sk = −J T (xk )r (xk ) is solved for approximate Newton step sk at each iteration i=1 This is system of normal equations for linear least squares
problem
J (xk )sk ∼ −r (xk )
= m Hessian matrices Hi are usually inconvenient and
expensive to compute
Moreover, in Hφ each Hi is multiplied by residual
component ri , which is small at solution if ﬁt of model
function to data is good which can be solved better by QR factorization
Next approximate solution is then given by
xk+1 = xk + sk
and process is repeated until convergence Michael T. Heath Scientiﬁc Computing 55 / 74 Michael T. Heath Scientiﬁc Computing 56 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Example: GaussNewton Method Example, continued Use GaussNewton method to ﬁt nonlinear model function T If we take x0 = 1 0 , then GaussNewton step s0 is
given by linear least squares problem −1
−1
0 −1 −1 s0 ∼ 0.3
−1 −2 = 0.7
0.9
−1 −3 f (t, x) = x1 exp(x2 t)
to data
t
y 0.0
2.0 1.0
0.7 2.0
0.3 3.0
0.1 For this model function, entries of Jacobian matrix of
residual function r are given by
{J (x)}i,1
{J (x)}i,2 whose solution is s0 = ∂ri (x)
=
= − exp(x2 ti )
∂x1 Michael T. Heath Scientiﬁc Computing 0.69
−0.61 Then next approximate solution is given by x1 = x0 + s0 ,
and process is repeated until convergence ∂ri (x)
=
= −x1 ti exp(x2 ti )
∂x2 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Michael T. Heath 57 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Example, continued Scientiﬁc Computing 58 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization GaussNewton Method, continued xk
1.000
1.690
1.975
1.994
1.995
1.995 r (xk )
2.390
0.212
0.007
0.002
0.002
0.002 0.000
−0.610
−0.930
−1.004
−1.009
−1.010 GaussNewton method replaces nonlinear least squares
problem by sequence of linear least squares problems
whose solutions converge to solution of original nonlinear
problem 2
2 If residual at solution is large, then secondorder term
omitted from Hessian is not negligible, and GaussNewton
method may converge slowly or fail to converge
In such “largeresidual” cases, it may be best to use
general nonlinear minimization method that takes into
account true full Hessian matrix < interactive example > Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 59 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization LevenbergMarquardt Method For equalityconstrained minimization problem
min f (x) subject to g (x) = 0
Rn where f :
→ R and g : Rn → Rm , with m ≤ n, we seek
critical point of Lagrangian L(x, λ) = f (x) + λT g (x) In this method, linear system at each iteration is of form
(J T (xk )J (xk ) + µk I )sk = −J T (xk )r (xk ) Applying Newton’s method to nonlinear system where µk is scalar parameter chosen by some strategy
L(x, λ) = Corresponding linear least squares problem is
J (xk )
−r (xk )
√
s∼
=
µk I k
0 Scientiﬁc Computing T
f (x) + Jg (x)λ
=0
g (x) we obtain linear system
T
B (x, λ) Jg (x)
Jg (x)
O With suitable strategy for choosing µk , this method can be
very robust in practice, and it forms basis for several
effective software packages
< interactive example >
Michael T. Heath 60 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization EqualityConstrained Optimization LevenbergMarquardt method is another useful alternative
when GaussNewton approximation is inadequate or yields
rank deﬁcient linear least squares subproblem Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing s
=−
δ T
f (x) + Jg (x)λ
g (x) for Newton step (s, δ ) in (x, λ) at each iteration
Michael T. Heath 61 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Sequential Quadratic Programming Scientiﬁc Computing 62 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Merit Function
Once Newton step (s, δ ) determined, we need merit
function to measure progress toward overall solution for
use in line search or trust region Foregoing block 2 × 2 linear system is equivalent to
quadratic programming problem, so this approach is
known as sequential quadratic programming Popular choices include penalty function Types of solution methods include φρ (x) = f (x) + 1 ρ g (x)T g (x)
2 Direct solution methods, in which entire block 2 × 2 system
is solved directly
Range space methods, based on block elimination in block
2 × 2 linear system
Null space methods, based on orthogonal factorization of
T
matrix of constraint normals, Jg (x) and augmented Lagrangian function
Lρ (x, λ) = f (x) + λT g (x) + 1 ρ g (x)T g (x)
2
where parameter ρ > 0 determines relative weighting of
optimality vs feasibility
Given starting guess x0 , good starting guess for λ0 can be
obtained from least squares problem
J T (x0 ) λ0 ∼ − f (x0 )
= < interactive example > g Michael T. Heath Scientiﬁc Computing 63 / 74 Michael T. Heath Scientiﬁc Computing 64 / 74 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization InequalityConstrained Optimization Penalty Methods
Merit function can also be used to convert
equalityconstrained problem into sequence of
unconstrained problems Methods just outlined for equality constraints can be
extended to handle inequality constraints by using active
set strategy If x∗ is solution to
ρ
min φρ (x) = f (x) + 1 ρ g (x)T g (x)
2 Inequality constraints are provisionally divided into those
that are satisﬁed already (and can therefore be temporarily
disregarded) and those that are violated (and are therefore
temporarily treated as equality constraints) x then under appropriate conditions
lim x∗ = x∗
ρ ρ→∞ This enables use of unconstrained optimization methods,
but problem becomes illconditioned for large ρ, so we
solve sequence of problems with gradually increasing
values of ρ, with minimum for each problem used as
starting point for next problem
< interactive example > This division of constraints is revised as iterations proceed
until eventually correct constraints are identiﬁed that are
binding at solution Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing 65 / 74 Michael T. Heath Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Barrier Methods Scientiﬁc Computing 66 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Example: Constrained Optimization For inequalityconstrained problems, another alternative is
barrier function, such as
p φµ (x) = f (x) − µ
i=1 or Consider quadratic programming problem
min f (x) = 0.5x2 + 2.5x2
1
2 1
hi (x) x subject to
g (x) = x1 − x2 − 1 = 0 p φµ (x) = f (x) − µ Lagrangian function is given by log(−hi (x)) L(x, λ) = f (x) + λ g (x) = 0.5x2 + 2.5x2 + λ(x1 − x2 − 1)
1
2 i=1 which increasingly penalize feasible points as they
approach boundary of feasible region
Again, solutions of unconstrained problem approach x∗ as
µ → 0, but problems are increasingly illconditioned, so
solve sequence of problems with decreasing values of µ
Barrier functions are basis for interior point methods for
linear programming
Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Scientiﬁc Computing Since
f (x) = x1
5x2 and Jg (x) = 1 −1 we have
x L(x, λ)
67 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization = T
f (x) + Jg (x)λ = Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Example, continued x1
1
+λ
5x2
−1 Scientiﬁc Computing 68 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Example, continued So system to be solved for critical point of Lagrangian is
x1 + λ = 0
5x2 − λ = 0
x1 − x2 = 1
which in this case is linear system 1
0
1
x1
0
0
5 −1 x2 = 0
1 −1
0
λ
1
Solving this system, we obtain solution
x1 = 0.833, x2 = −0.167, Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization λ = −0.833 Scientiﬁc Computing 69 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Michael T. Heath
Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Linear Programming Scientiﬁc Computing 70 / 74 Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Linear Programming, continued
Simplex method is reliable and normally efﬁcient, able to
solve problems with thousands of variables, but can
require time exponential in size of problem in worst case One of most important and common constrained
optimization problems is linear programming
One standard form for such problems is
min f (x) = cT x subject to Ax = b and Interior point methods for linear programming developed in
recent years have polynomial worst case solution time x≥0 where m < n, A ∈ Rm×n , b ∈ Rm , and c, x ∈ Rn These methods move through interior of feasible region,
not restricting themselves to investigating only its vertices Feasible region is convex polyhedron in Rn , and minimum
must occur at one of its vertices Although interior point methods have signiﬁcant practical
impact, simplex method is still predominant method in
standard packages for linear programming, and its
effectiveness in practice is excellent Simplex method moves systematically from vertex to
vertex until minimum point is found
Michael T. Heath Scientiﬁc Computing 71 / 74 Michael T. Heath Scientiﬁc Computing 72 / 74 Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Optimization Problems
OneDimensional Optimization
MultiDimensional Optimization Example: Linear Programming Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization Example, continued To illustrate linear programming, consider
min = cT x = −8x1 − 11x2
x subject to linear inequality constraints
5x1 + 4x2 ≤ 40, −x1 + 3x2 ≤ 12, x1 ≥ 0, x2 ≥ 0 Minimum value must occur at vertex of feasible region, in
this case at x1 = 3.79, x2 = 5.26, where objective function
has value −88.2 Michael T. Heath Scientiﬁc Computing 73 / 74 Michael T. Heath Scientiﬁc Computing 74 / 74 ...
View
Full
Document
This note was uploaded on 10/16/2011 for the course MECHANICAL 581 taught by Professor Wasfy during the Fall '11 term at IUPUI.
 Fall '11
 Wasfy
 Mechanical Engineering

Click to edit the document details