chap06_8up - Optimization Problems One-Dimensional...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Outline Scientific Computing: An Introductory Survey Chapter 6 – Optimization 1 2 One-Dimensional Optimization 3 Prof. Michael T. Heath Optimization Problems Multi-Dimensional Optimization Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted for noncommercial, educational use only. Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing Scientific Computing Michael T. Heath 1 / 74 Definitions Existence and Uniqueness Optimality Conditions Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Optimization 2 / 74 Definitions Existence and Uniqueness Optimality Conditions Optimization Problems Given function f : Rn → R, and set S ⊆ Rn , find x∗ ∈ S such that f (x∗ ) ≤ f (x) for all x ∈ S General continuous optimization problem: x∗ is called minimizer or minimum of f min f (x) subject to g (x) = 0 and h(x) ≤ 0 It suffices to consider only minimization, since maximum of f is minimum of −f where f : Objective function f is usually differentiable, and may be linear or nonlinear Linear programming : f , g , and h are all linear Constraint set S is defined by system of equations and inequalities, which may be linear or nonlinear Rn → R, g : Rn → Rm , h : Rn → R p Nonlinear programming : at least one of f , g , and h is nonlinear Points x ∈ S are called feasible points If S = Rn , problem is unconstrained Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 3 / 74 Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Definitions Existence and Uniqueness Optimality Conditions Examples: Optimization Problems Scientific Computing 4 / 74 Definitions Existence and Uniqueness Optimality Conditions Local vs Global Optimization Minimize weight of structure subject to constraint on its strength, or maximize its strength subject to constraint on its weight x∗ ∈ S is global minimum if f (x∗ ) ≤ f (x) for all x ∈ S x∗ ∈ S is local minimum if f (x∗ ) ≤ f (x) for all feasible x in some neighborhood of x∗ Minimize cost of diet subject to nutritional constraints Minimize surface area of cylinder subject to constraint on its volume: min f (x1 , x2 ) = 2πx1 (x1 + x2 ) x1 ,x2 subject to g (x1 , x2 ) = πx2 x2 − V = 0 1 where x1 and x2 are radius and height of cylinder, and V is required volume Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 5 / 74 Definitions Existence and Uniqueness Optimality Conditions Scientific Computing 6 / 74 Definitions Existence and Uniqueness Optimality Conditions Existence of Minimum Global Optimization If f is continuous on closed and bounded set S ⊆ Rn , then f has global minimum on S Finding, or even verifying, global minimum is difficult, in general If S is not closed or is unbounded, then f may have no local or global minimum on S Most optimization methods are designed to find local minimum, which may or may not be global minimum Continuous function f on unbounded set S ⊆ Rn is coercive if lim f (x) = +∞ If global minimum is desired, one can try several widely separated starting points and see if all produce same result x →∞ i.e., f (x) must be large whenever x is large For some problems, such as linear programming, global optimization is more tractable Michael T. Heath Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing If f is coercive on closed, unbounded set S ⊆ Rn , then f has global minimum on S 7 / 74 Michael T. Heath Scientific Computing 8 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Definitions Existence and Uniqueness Optimality Conditions Level Sets Definitions Existence and Uniqueness Optimality Conditions Uniqueness of Minimum Level set for function f : S ⊆ Rn → R is set of all points in S for which f has some given constant value Set S ⊆ Rn is convex if it contains line segment between any two of its points For given γ ∈ R, sublevel set is Function f : S ⊆ Rn → R is convex on convex set S if its graph along any line segment in S lies on or below chord connecting function values at endpoints of segment Lγ = {x ∈ S : f (x) ≤ γ } If continuous function f on S ⊆ Rn has nonempty sublevel set that is closed and bounded, then f has global minimum on S Any local minimum of convex function f on convex set S ⊆ Rn is global minimum of f on S Any local minimum of strictly convex function f on convex set S ⊆ Rn is unique global minimum of f on S If S is unbounded, then f is coercive on S if, and only if, all of its sublevel sets are bounded Scientific Computing Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization 9 / 74 Michael T. Heath Definitions Existence and Uniqueness Optimality Conditions Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization First-Order Optimality Condition For twice continuously differentiable f : S ⊆ Rn → R, we can distinguish among critical points by considering Hessian matrix Hf (x) defined by Generalization to function of n variables is to find critical point, i.e., solution of nonlinear system {Hf (x)}ij = f (x) = 0 where f (x) is gradient vector of f , whose ith component is ∂f (x)/∂xi At critical point x∗ , if Hf (x∗ ) is positive definite, then x∗ is minimum of f negative definite, then x∗ is maximum of f indefinite, then x∗ is saddle point of f singular, then various pathological situations are possible But not all critical points are minima: they can also be maxima or saddle points Scientific Computing Michael T. Heath 11 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Definitions Existence and Uniqueness Optimality Conditions Constrained Optimality Scientific Computing 12 / 74 Definitions Existence and Uniqueness Optimality Conditions Constrained Optimality, continued If problem is constrained, only feasible directions are relevant Lagrangian function L : Rn+m → R, is defined by L(x, λ) = f (x) + λT g (x) For equality-constrained problem Its gradient is given by min f (x) subject to g (x) = 0 Rn Rn Rm , where f : → R and g : → with m ≤ n, necessary condition for feasible point x∗ to be solution is that negative gradient of f lie in space spanned by constraint normals, − ∂ 2 f (x) ∂xi ∂xj which is symmetric For continuously differentiable f : S ⊆ Rn → R, any interior point x∗ of S at which f has local minimum must be critical point of f Michael T. Heath 10 / 74 Definitions Existence and Uniqueness Optimality Conditions Second-Order Optimality Condition For function of one variable, one can find extremum by differentiating function and setting derivative to zero Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing f (x∗ ) = L(x, λ) = Its Hessian is given by T Jg (x∗ )λ HL (x, λ) = where Jg is Jacobian matrix of g , and λ is vector of Lagrange multipliers Michael T. Heath Scientific Computing T B (x, λ) Jg (x) Jg (x) O where This condition says we cannot reduce objective function without violating constraints Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization T f (x) + Jg (x)λ g (x) m B (x, λ) = Hf (x) + λi Hgi (x) i=1 13 / 74 Definitions Existence and Uniqueness Optimality Conditions Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Constrained Optimality, continued Scientific Computing 14 / 74 Definitions Existence and Uniqueness Optimality Conditions Constrained Optimality, continued Together, necessary condition and feasibility imply critical point of Lagrangian function, L(x, λ) = T f (x) + Jg (x)λ =0 g (x) If inequalities are present, then KKT optimality conditions also require nonnegativity of Lagrange multipliers corresponding to inequalities, and complementarity condition Hessian of Lagrangian is symmetric, but not positive definite, so critical point of L is saddle point rather than minimum or maximum Critical point (x∗ , λ∗ ) of L is constrained minimum of f if B (x∗ , λ∗ ) is positive definite on null space of Jg (x∗ ) If columns of Z form basis for null space, then test projected Hessian Z T BZ for positive definiteness Michael T. Heath Scientific Computing 15 / 74 Michael T. Heath Scientific Computing 16 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Definitions Existence and Uniqueness Optimality Conditions Golden Section Search Successive Parabolic Interpolation Newton’s Method Unimodality Sensitivity and Conditioning Function minimization and equation solving are closely related problems, but their sensitivities differ For minimizing function of one variable, we need “bracket” for solution analogous to sign change for nonlinear equation In one dimension, absolute condition number of root x∗ of equation f (x) = 0 is 1/|f (x∗ )|, so if |f (ˆ)| ≤ , then x |x − x∗ | may be as large as /|f (x∗ )| ˆ Real-valued function f is unimodal on interval [a, b] if there is unique x∗ ∈ [a, b] such that f (x∗ ) is minimum of f on [a, b], and f is strictly decreasing for x ≤ x∗ , strictly increasing for x∗ ≤ x For minimizing f , Taylor series expansion f (ˆ) = f (x∗ + h) x = f (x∗ ) + f (x∗ )h + 1 f (x∗ )h2 + O(h3 ) 2 shows that, since f (x∗ ) = 0, if |f (ˆ) − f (x∗ )| ≤ , then x |x − x∗ | may be as large as 2 /|f (x∗ )| ˆ Unimodality enables discarding portions of interval based on sample function values, analogous to interval bisection Thus, based on function values alone, minima can be computed to only about half precision Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 17 / 74 Golden Section Search Successive Parabolic Interpolation Newton’s Method Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Golden Section Search Scientific Computing 18 / 74 Golden Section Search Successive Parabolic Interpolation Newton’s Method Golden Section Search, continued Suppose f is unimodal on [a, b], and let x1 and x2 be two points within [a, b], with x1 < x2 To accomplish this, we choose relative positions of two points as τ and 1 − τ , where τ 2 = 1 − τ , so √ τ = ( 5 − 1)/2 ≈ 0.618 and 1 − τ ≈ 0.382 Evaluating and comparing f (x1 ) and f (x2 ), we can discard either (x2 , b] or [a, x1 ), with minimum known to lie in remaining subinterval Whichever subinterval is retained, its length will be τ relative to previous interval, and interior point retained will be at position either τ or 1 − τ relative to new interval To repeat process, we need compute only one new function evaluation To continue iteration, we need to compute only one new function value, at complementary point To reduce length of interval by fixed fraction at each iteration, each new pair of points must have same relationship with respect to new interval that previous pair had with respect to previous interval This choice of sample points is called golden section search Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing Golden section search is safe but convergence rate is only linear, with constant C ≈ 0.618 19 / 74 Golden Section Search Successive Parabolic Interpolation Newton’s Method Golden Section Search, continued Michael T. Heath 20 / 74 Golden Section Search Successive Parabolic Interpolation Newton’s Method Use golden section search to minimize f (x) = 0.5 − x exp(−x2 ) Scientific Computing 21 / 74 Golden Section Search Successive Parabolic Interpolation Newton’s Method Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Example, continued x1 0.764 0.472 0.764 0.652 0.584 0.652 0.695 0.679 0.695 0.705 Scientific Computing Example: Golden Section Search √ τ = ( 5 − 1)/2 x1 = a + (1 − τ )(b − a); f1 = f (x1 ) x2 = a + τ (b − a); f2 = f (x2 ) while ((b − a) > tol) do if (f1 > f2 ) then a = x1 x1 = x2 f1 = f2 x2 = a + τ (b − a) f2 = f (x2 ) else b = x2 x2 = x1 f2 = f1 x1 = a + (1 − τ )(b − a) f1 = f (x1 ) end end Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 22 / 74 Golden Section Search Successive Parabolic Interpolation Newton’s Method Successive Parabolic Interpolation f1 0.074 0.122 0.074 0.074 0.085 0.074 0.071 0.072 0.071 0.071 x2 1.236 0.764 0.944 0.764 0.652 0.695 0.721 0.695 0.705 0.711 Fit quadratic polynomial to three function values Take minimum of quadratic to be new approximation to minimum of function f2 0.232 0.074 0.113 0.074 0.074 0.071 0.071 0.071 0.071 0.071 New point replaces oldest of three previous points and process is repeated until convergence Convergence rate of successive parabolic interpolation is superlinear, with r ≈ 1.324 < interactive example > Michael T. Heath Scientific Computing 23 / 74 Michael T. Heath Scientific Computing 24 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Golden Section Search Successive Parabolic Interpolation Newton’s Method Golden Section Search Successive Parabolic Interpolation Newton’s Method Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Example: Successive Parabolic Interpolation Example, continued Use successive parabolic interpolation to minimize f (x) = 0.5 − x exp(−x2 ) xk 0.000 0.600 1.200 0.754 0.721 0.692 0.707 f (xk ) 0.500 0.081 0.216 0.073 0.071 0.071 0.071 < interactive example > Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 25 / 74 Scientific Computing Michael T. Heath Golden Section Search Successive Parabolic Interpolation Newton’s Method Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Newton’s Method 26 / 74 Golden Section Search Successive Parabolic Interpolation Newton’s Method Example: Newton’s Method Use Newton’s method to minimize f (x) = 0.5 − x exp(−x2 ) Another local quadratic approximation is truncated Taylor series f (x) 2 f (x + h) ≈ f (x) + f (x)h + h 2 First and second derivatives of f are given by f (x) = (2x2 − 1) exp(−x2 ) and By differentiation, minimum of this quadratic function of h is given by h = −f (x)/f (x) f (x) = 2x(3 − 2x2 ) exp(−x2 ) Newton iteration for zero of f is given by Suggests iteration scheme xk+1 = xk − (2x2 − 1)/(2xk (3 − 2x2 )) k k xk+1 = xk − f (xk )/f (xk ) Using starting guess x0 = 1, we obtain which is Newton’s method for solving nonlinear equation f (x) = 0 xk 1.000 0.500 0.700 0.707 Newton’s method for finding minimum normally has quadratic convergence rate, but must be started close enough to solution to converge < interactive example > Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 27 / 74 Golden Section Search Successive Parabolic Interpolation Newton’s Method Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Safeguarded Methods f (xk ) 0.132 0.111 0.071 0.071 Scientific Computing 28 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Direct Search Methods Direct search methods for multidimensional optimization make no use of function values other than comparing them As with nonlinear equations in one dimension, slow-but-sure and fast-but-risky optimization methods can be combined to provide both safety and efficiency For minimizing function f of n variables, Nelder-Mead method begins with n + 1 starting points, forming simplex in Rn Most library routines for one-dimensional optimization are based on this hybrid approach Then move to new point along straight line from current point having highest function value through centroid of other points Popular combination is golden section search and successive parabolic interpolation, for which no derivatives are required New point replaces worst point, and process is repeated Direct search methods are useful for nonsmooth functions or for small n, but expensive for larger n < interactive example > Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 29 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Steepest Descent Method Scientific Computing 30 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Steepest Descent, continued Let f : Rn → R be real-valued function of n real variables Given descent direction, such as negative gradient, determining appropriate value for αk at each iteration is one-dimensional minimization problem At any point x where gradient vector is nonzero, negative gradient, − f (x), points downhill toward lower values of f min f (xk − αk f (xk )) αk In fact, − f (x) is locally direction of steepest descent: f decreases more rapidly along direction of negative gradient than along any other that can be solved by methods already discussed Steepest descent method is very reliable: it can always make progress provided gradient is nonzero Steepest descent method: starting from initial guess x0 , successive approximate solutions given by But method is myopic in its view of function’s behavior, and resulting iterates can zigzag back and forth, making very slow progress toward solution xk+1 = xk − αk f (xk ) where αk is line search parameter that determines how far to go in given direction Michael T. Heath Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing In general, convergence rate of steepest descent is only linear, with constant factor that can be arbitrarily close to 1 31 / 74 Michael T. Heath Scientific Computing 32 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Example: Steepest Descent Example, continued Use steepest descent method to minimize xk f (x) = 0.5x2 + 2.5x2 1 2 Gradient is given by Taking x0 = f (x) = 5 , we have 1 5.000 3.333 2.222 1.481 0.988 0.658 0.439 0.293 0.195 0.130 x1 5x2 f (x0 ) = 5 5 Performing line search along negative gradient direction, min f (x0 − α0 f (x0 )) α0 exact minimum along line is given by α0 = 1/3, so next 3.333 approximation is x1 = −0.667 Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 1.000 −0.667 0.444 −0.296 0.198 −0.132 0.088 −0.059 0.039 −0.026 33 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization f (xk ) 15.000 6.667 2.963 1.317 0.585 0.260 0.116 0.051 0.023 0.010 f (xk ) 5.000 −3.333 2.222 −1.481 0.988 −0.658 0.439 −0.293 0.195 −0.130 Scientific Computing Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Example, continued 5.000 3.333 2.222 1.481 0.988 0.658 0.439 0.293 0.195 0.130 34 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Newton’s Method Broader view can be obtained by local quadratic approximation, which is equivalent to Newton’s method In multidimensional optimization, we seek zero of gradient, so Newton iteration has form − xk+1 = xk − Hf 1 (xk ) f (xk ) where Hf (x) is Hessian matrix of second partial derivatives of f , {Hf (x)}ij = ∂ 2 f (x) ∂xi ∂xj < interactive example > Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 35 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Michael T. Heath Newton’s Method, continued Use Newton’s method to minimize f (x) = 0.5x2 + 2.5x2 1 2 Hf (xk )sk = − f (xk ) Gradient and Hessian are given by for Newton step sk , then take as next iterate f (x) = xk+1 = xk + sk As usual, Newton’s method is unreliable unless started close enough to solution to converge < interactive example > Scientific Computing 37 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization 5 , we have 1 10 05 Michael T. Heath f (x0 ) = Scientific Computing 38 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Newton’s Method, continued If objective function f has continuous second partial derivatives, then Hessian matrix Hf is symmetric, and near minimum it is positive definite In principle, line search parameter is unnecessary with Newton’s method, since quadratic model determines length, as well as direction, of step to next approximate solution Thus, linear system for step to next iterate can be solved in only about half of work required for LU factorization Far from minimum, Hf (xk ) may not be positive definite, so Newton step sk may not be descent direction for function, i.e., we may not have When started far from solution, however, it may still be advisable to perform line search along direction of Newton step sk to make method more robust (damped Newton) f (xk )T sk < 0 Once iterates are near solution, then αk = 1 should suffice for subsequent iterations Scientific Computing and Hf (x) = Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Newton’s Method, continued Michael T. Heath x1 5x2 5 5 10 −5 Linear system for Newton step is s= , so 05 0 −5 5 −5 0 x1 = x0 + s0 = + = , which is exact solution 1 −1 0 for this problem, as expected for quadratic function Taking x0 = Convergence rate of Newton’s method for minimization is normally quadratic Michael T. Heath 36 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Example: Newton’s Method Do not explicitly invert Hessian matrix, but instead solve linear system Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization In this case, alternative descent direction can be computed, such as negative gradient or direction of negative curvature, and then perform line search 39 / 74 Michael T. Heath Scientific Computing 40 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Trust Region Methods Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Trust Region Methods, continued Alternative to line search is trust region method, in which approximate solution is constrained to lie within region where quadratic model is sufficiently accurate If current trust radius is binding, minimizing quadratic model function subject to this constraint may modify direction as well as length of Newton step Accuracy of quadratic model is assessed by comparing actual decrease in objective function with that predicted by quadratic model, and trust radius is increased or decreased accordingly Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 41 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Quasi-Newton Methods 42 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Could use Broyden’s method to seek zero of gradient, but this would not preserve symmetry of Hessian matrix Many variants of Newton’s method improve reliability and reduce overhead Several secant updating formulas have been developed for minimization that not only preserve symmetry in approximate Hessian matrix, but also preserve positive definiteness Quasi-Newton methods have form − xk+1 = xk − αk Bk 1 f (xk ) where αk is line search parameter and Bk is approximation to Hessian matrix Symmetry reduces amount of work required by about half, while positive definiteness guarantees that quasi-Newton step will be descent direction Many quasi-Newton methods are more robust than Newton’s method, are superlinearly convergent, and have lower overhead per iteration, which often more than offsets their slower convergence rate Michael T. Heath Scientific Computing Secant Updating Methods Newton’s method costs O(n3 ) arithmetic and O(n2 ) scalar function evaluations per iteration for dense problem Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 43 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization BFGS Method Scientific Computing 44 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization BFGS Method, continued In practice, factorization of Bk is updated rather than Bk itself, so linear system for sk can be solved at cost of O(n2 ) rather than O(n3 ) work One of most effective secant updating methods for minimization is BFGS Unlike Newton’s method for minimization, no second derivatives are required x0 = initial guess B0 = initial Hessian approximation for k = 0, 1, 2, . . . Solve Bk sk = − f (xk ) for sk xk+1 = xk + sk yk = f (xk+1 ) − f (xk ) T T Bk+1 = Bk + (yk yk )/(yk sk ) − (Bk sk sT Bk )/(sT Bk sk ) k k end Can start with B0 = I , so initial step is along negative gradient, and then second derivative information is gradually built up in approximate Hessian matrix over successive iterations BFGS normally has superlinear convergence rate, even though approximate Hessian does not necessarily converge to true Hessian Line search can be used to enhance effectiveness Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 45 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Example: BFGS Method T xk 5.000 1.000 0.000 −4.000 −2.222 0.444 0.816 0.082 −0.009 −0.015 −0.001 0.001 and B0 = I , initial step is negative x1 = x0 + s0 = 5 −5 0 + = 1 −5 −4 46 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization f (xk ) 5.000 5.000 0.000 −20.000 −2.222 2.222 0.816 0.408 −0.009 −0.077 −0.001 0.005 For quadratic objective function, BFGS with exact line search finds exact solution in at most n iterations, where n is dimension of problem < interactive example > Then new step is computed and process is repeated Scientific Computing f (xk ) 15.000 40.000 2.963 0.350 0.001 0.000 Increase in function value can be avoided by using line search, which generally enhances convergence Updating approximate Hessian using BFGS formula, we obtain 0.667 0.333 B1 = 0.333 0.667 Michael T. Heath Scientific Computing Example: BFGS Method Use BFGS to minimize f (x) = 0.5x2 + 2.5x2 1 2 x1 Gradient is given by f (x) = 5x2 Taking x0 = 5 1 gradient, so Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization 47 / 74 Michael T. Heath Scientific Computing 48 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Conjugate Gradient Method Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Conjugate Gradient Method, continued x0 = initial guess g0 = f (x0 ) s0 = −g0 for k = 0, 1, 2, . . . Choose αk to minimize f (xk + αk sk ) xk+1 = xk + αk sk gk+1 = f (xk+1 ) T T βk+1 = (gk+1 gk+1 )/(gk gk ) sk+1 = −gk+1 + βk+1 sk end Another method that does not require explicit second derivatives, and does not even store approximation to Hessian matrix, is conjugate gradient (CG) method CG generates sequence of conjugate search directions, implicitly accumulating information about Hessian matrix For quadratic objective function, CG is theoretically exact after at most n iterations, where n is dimension of problem CG is effective for general unconstrained minimization as well Alternative formula for βk+1 is T βk+1 = ((gk+1 − gk )T gk+1 )/(gk gk ) Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 49 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Example: Conjugate Gradient Method T At this point, however, rather than search along new negative gradient, we compute instead T T β1 = (g1 g1 )/(g0 g0 ) = 0.444 which gives as next search direction −5 −5 f (x1 ) = Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Minimum along this direction is given by α1 = 0.6, which gives exact solution at origin, as expected for quadratic function 3.333 −3.333 Scientific Computing < interactive example > 51 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization 52 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Given data (ti , yi ), find vector x of parameters that gives “best fit” in least squares sense to model function f (t, x), where f is nonlinear function of x Small number of iterations may suffice to produce step as useful as true Newton step, especially far from overall solution, where true Newton step may be unreliable anyway Define components of residual function ri (x) = yi − f (ti , x), so we want to minimize φ(x) = Good choice for linear iterative solver is CG method, which gives step intermediate between steepest descent and Newton-like step Gradient vector is is Scientific Computing i = 1, . . . , m 1T 2 r (x)r (x) φ(x) = J T (x)r (x) and Hessian matrix m Hφ (x) = J T (x)J (x) + Since only matrix-vector products are required, explicit formation of Hessian matrix can be avoided by using finite difference of gradient along given vector ri (x)Hi (x) i=1 where J (x) is Jacobian of r (x), and Hi (x) is Hessian of ri (x) 53 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Nonlinear Least Squares, continued Scientific Computing 54 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Gauss-Newton Method This motivates Gauss-Newton method for nonlinear least squares, in which second-order term is dropped and linear system J T (xk )J (xk )sk = −J T (xk )r (xk ) Linear system for Newton step is m J T (xk )J (xk ) + Scientific Computing Nonlinear Least Squares Another way to reduce work in Newton-like methods is to solve linear system for Newton step by iterative method Michael T. Heath Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Truncated Newton Methods Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization −3.333 −5 −5.556 + 0.444 = 3.333 −5 1.111 s1 = −g1 + β1 s0 = Exact minimum along line is given by α0 = 1/3, so next T approximation is x1 = 3.333 −0.667 , and we compute new gradient, g1 = 50 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization So far there is no difference from steepest descent method , initial search direction is negative s0 = −g0 = − f (x0 ) = Scientific Computing Example, continued Use CG method to minimize f (x) = 0.5x2 + 2.5x2 1 2 x1 Gradient is given by f (x) = 5x2 Taking x0 = 5 1 gradient, Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization ri (xk )Hi (xk ) sk = −J T (xk )r (xk ) is solved for approximate Newton step sk at each iteration i=1 This is system of normal equations for linear least squares problem J (xk )sk ∼ −r (xk ) = m Hessian matrices Hi are usually inconvenient and expensive to compute Moreover, in Hφ each Hi is multiplied by residual component ri , which is small at solution if fit of model function to data is good which can be solved better by QR factorization Next approximate solution is then given by xk+1 = xk + sk and process is repeated until convergence Michael T. Heath Scientific Computing 55 / 74 Michael T. Heath Scientific Computing 56 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Example: Gauss-Newton Method Example, continued Use Gauss-Newton method to fit nonlinear model function T If we take x0 = 1 0 , then Gauss-Newton step s0 is given by linear least squares problem −1 −1 0 −1 −1 s0 ∼ 0.3 −1 −2 = 0.7 0.9 −1 −3 f (t, x) = x1 exp(x2 t) to data t y 0.0 2.0 1.0 0.7 2.0 0.3 3.0 0.1 For this model function, entries of Jacobian matrix of residual function r are given by {J (x)}i,1 {J (x)}i,2 whose solution is s0 = ∂ri (x) = = − exp(x2 ti ) ∂x1 Michael T. Heath Scientific Computing 0.69 −0.61 Then next approximate solution is given by x1 = x0 + s0 , and process is repeated until convergence ∂ri (x) = = −x1 ti exp(x2 ti ) ∂x2 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Michael T. Heath 57 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Example, continued Scientific Computing 58 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Gauss-Newton Method, continued xk 1.000 1.690 1.975 1.994 1.995 1.995 r (xk ) 2.390 0.212 0.007 0.002 0.002 0.002 0.000 −0.610 −0.930 −1.004 −1.009 −1.010 Gauss-Newton method replaces nonlinear least squares problem by sequence of linear least squares problems whose solutions converge to solution of original nonlinear problem 2 2 If residual at solution is large, then second-order term omitted from Hessian is not negligible, and Gauss-Newton method may converge slowly or fail to converge In such “large-residual” cases, it may be best to use general nonlinear minimization method that takes into account true full Hessian matrix < interactive example > Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 59 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Levenberg-Marquardt Method For equality-constrained minimization problem min f (x) subject to g (x) = 0 Rn where f : → R and g : Rn → Rm , with m ≤ n, we seek critical point of Lagrangian L(x, λ) = f (x) + λT g (x) In this method, linear system at each iteration is of form (J T (xk )J (xk ) + µk I )sk = −J T (xk )r (xk ) Applying Newton’s method to nonlinear system where µk is scalar parameter chosen by some strategy L(x, λ) = Corresponding linear least squares problem is J (xk ) −r (xk ) √ s∼ = µk I k 0 Scientific Computing T f (x) + Jg (x)λ =0 g (x) we obtain linear system T B (x, λ) Jg (x) Jg (x) O With suitable strategy for choosing µk , this method can be very robust in practice, and it forms basis for several effective software packages < interactive example > Michael T. Heath 60 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Equality-Constrained Optimization Levenberg-Marquardt method is another useful alternative when Gauss-Newton approximation is inadequate or yields rank deficient linear least squares subproblem Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing s =− δ T f (x) + Jg (x)λ g (x) for Newton step (s, δ ) in (x, λ) at each iteration Michael T. Heath 61 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Sequential Quadratic Programming Scientific Computing 62 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Merit Function Once Newton step (s, δ ) determined, we need merit function to measure progress toward overall solution for use in line search or trust region Foregoing block 2 × 2 linear system is equivalent to quadratic programming problem, so this approach is known as sequential quadratic programming Popular choices include penalty function Types of solution methods include φρ (x) = f (x) + 1 ρ g (x)T g (x) 2 Direct solution methods, in which entire block 2 × 2 system is solved directly Range space methods, based on block elimination in block 2 × 2 linear system Null space methods, based on orthogonal factorization of T matrix of constraint normals, Jg (x) and augmented Lagrangian function Lρ (x, λ) = f (x) + λT g (x) + 1 ρ g (x)T g (x) 2 where parameter ρ > 0 determines relative weighting of optimality vs feasibility Given starting guess x0 , good starting guess for λ0 can be obtained from least squares problem J T (x0 ) λ0 ∼ − f (x0 ) = < interactive example > g Michael T. Heath Scientific Computing 63 / 74 Michael T. Heath Scientific Computing 64 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Inequality-Constrained Optimization Penalty Methods Merit function can also be used to convert equality-constrained problem into sequence of unconstrained problems Methods just outlined for equality constraints can be extended to handle inequality constraints by using active set strategy If x∗ is solution to ρ min φρ (x) = f (x) + 1 ρ g (x)T g (x) 2 Inequality constraints are provisionally divided into those that are satisfied already (and can therefore be temporarily disregarded) and those that are violated (and are therefore temporarily treated as equality constraints) x then under appropriate conditions lim x∗ = x∗ ρ ρ→∞ This enables use of unconstrained optimization methods, but problem becomes ill-conditioned for large ρ, so we solve sequence of problems with gradually increasing values of ρ, with minimum for each problem used as starting point for next problem < interactive example > This division of constraints is revised as iterations proceed until eventually correct constraints are identified that are binding at solution Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing 65 / 74 Michael T. Heath Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Barrier Methods Scientific Computing 66 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Example: Constrained Optimization For inequality-constrained problems, another alternative is barrier function, such as p φµ (x) = f (x) − µ i=1 or Consider quadratic programming problem min f (x) = 0.5x2 + 2.5x2 1 2 1 hi (x) x subject to g (x) = x1 − x2 − 1 = 0 p φµ (x) = f (x) − µ Lagrangian function is given by log(−hi (x)) L(x, λ) = f (x) + λ g (x) = 0.5x2 + 2.5x2 + λ(x1 − x2 − 1) 1 2 i=1 which increasingly penalize feasible points as they approach boundary of feasible region Again, solutions of unconstrained problem approach x∗ as µ → 0, but problems are increasingly ill-conditioned, so solve sequence of problems with decreasing values of µ Barrier functions are basis for interior point methods for linear programming Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Scientific Computing Since f (x) = x1 5x2 and Jg (x) = 1 −1 we have x L(x, λ) 67 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization = T f (x) + Jg (x)λ = Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Example, continued x1 1 +λ 5x2 −1 Scientific Computing 68 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Example, continued So system to be solved for critical point of Lagrangian is x1 + λ = 0 5x2 − λ = 0 x1 − x2 = 1 which in this case is linear system 1 0 1 x1 0 0 5 −1 x2 = 0 1 −1 0 λ 1 Solving this system, we obtain solution x1 = 0.833, x2 = −0.167, Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization λ = −0.833 Scientific Computing 69 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Michael T. Heath Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Linear Programming Scientific Computing 70 / 74 Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Linear Programming, continued Simplex method is reliable and normally efficient, able to solve problems with thousands of variables, but can require time exponential in size of problem in worst case One of most important and common constrained optimization problems is linear programming One standard form for such problems is min f (x) = cT x subject to Ax = b and Interior point methods for linear programming developed in recent years have polynomial worst case solution time x≥0 where m < n, A ∈ Rm×n , b ∈ Rm , and c, x ∈ Rn These methods move through interior of feasible region, not restricting themselves to investigating only its vertices Feasible region is convex polyhedron in Rn , and minimum must occur at one of its vertices Although interior point methods have significant practical impact, simplex method is still predominant method in standard packages for linear programming, and its effectiveness in practice is excellent Simplex method moves systematically from vertex to vertex until minimum point is found Michael T. Heath Scientific Computing 71 / 74 Michael T. Heath Scientific Computing 72 / 74 Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Optimization Problems One-Dimensional Optimization Multi-Dimensional Optimization Example: Linear Programming Unconstrained Optimization Nonlinear Least Squares Constrained Optimization Example, continued To illustrate linear programming, consider min = cT x = −8x1 − 11x2 x subject to linear inequality constraints 5x1 + 4x2 ≤ 40, −x1 + 3x2 ≤ 12, x1 ≥ 0, x2 ≥ 0 Minimum value must occur at vertex of feasible region, in this case at x1 = 3.79, x2 = 5.26, where objective function has value −88.2 Michael T. Heath Scientific Computing 73 / 74 Michael T. Heath Scientific Computing 74 / 74 ...
View Full Document

This note was uploaded on 10/16/2011 for the course MECHANICAL 581 taught by Professor Wasfy during the Fall '11 term at IUPUI.

Ask a homework question - tutors are online