This preview shows page 1. Sign up to view the full content.
Unformatted text preview: UC Berkeley Department of Electrical Engineering and Computer Sciences Department of Statistics EE 227A / STAT 260 : NONLINEAR AND CONVEX OPTIMIZATION Problem Set 2 Fall 2004 Issued: Thursday, September 16, 2004 Due: Tuedsay, September 28, 2004 Problem 2.1 Consider the problem of minimizing the function of two variables f ( x, y ) = 3x 2 + y 4 . (a) Apply one iteration of the steepest descent method with ( 1, – 2 ) as the starting point and with the stepsize chosen by the Armijo rule with s = 1 , σ = 0.1 , and β = 0.5 . (b) Repeat (a) using s = 1 , σ = 0.1 , β = 0.1 instead. How does the cost of the new iterate compare to that obtained in (a)? Comment on the tradeoffs involved in the choice of β . (c) Apply one iteration of Newton’s method with the same starting point and stepsize rule as in (a). How does the cost of the new iterate compare to that obtained in (a)? How about the amount of work involved in finding the new iterate? Problem 2.2 Consider the gradient method x k + 1 = x k + α k d k , where α k is chosen by the Armijo rule and 0 : 0
k) d k = – ∂f ( x  , ∂x i 0 : 0 where i is the index for which ∂f ( x k ) ⁄ ( ∂x j ) is maximized over j = 1 ,..., n . Show that every limit point of { x k } is stationary. Hint: Verify that the gradient relatedness condition holds. Problem 2.3. Consider a positive definite quadratic problem with Hessian matrix Q. Suppose we use scaling with the diagonal matrix whose ith diagonal element is q ii , where q ii is the ith diagonal
–1 1 element of Q. Show that if Q is 2 x 2, this diagonal scaling improves the condition number of the problem and the convergence rate of steepest descent. Problem 2.4 (Steepest Descent with Errors) In practice, it may not be possible for various reasons (e.g., numerical issues, finite difference approximations) to compute the gradient exactly. Consider the steepest descent method x k + 1 = x k – s ( ∇f ( x k ) + e k ) , where s is a constant stepsize, e k is an error satisfying e k ≤ δ for all k , and f is the positive definite quadratic function 1 f ( x ) =  ( x – x * )′Q ( x – x * ) . 2 Let q = max { 1 – sm , 1 – sM } , where m : smallest eigenvalue of Q, and assume that q < 1 . Show that for all k , we have sδ x k – x * ≤  + q k x 0 – x * . 1–q Problem 2.5 Consider a linear invertible transformation of variables x = Sy . Write Newton’s method in the space of the variables y and show that it generates the sequence y k = S –1 x k , where { x k } is the sequence generated by Newton’s method in the space of the variables x . Thus, Newton’s method remains invariant under linear transformations. Problem 2.6 Apply Newton’s method to minimization of the function f ( x ) = x x* M : largest eigenvalue of Q, 3 and show that it con = 0 . Explain this fact in light of the result on Newton’s method from verges linearly to class (i.e., which conditions are violated?) Hint: You may find the following identity useful (presuming that inverses exist.) ( A + CBC′ ) –1 = A – 1 – A –1 C ( B – 1 + C′A –1 C ) –1 C′A – 1 for square matrices A, B, and a third matrix C of appropriate dimension. 2 Problem 2.7 Let Q be a strictly positive definite symmetric matrix. (a) Show that x Q = x T Qx defines a norm on IRn. (See Appendix A of Bertsekas for the definition of a norm.) (b) State and prove a generalization of the projection theorem that involves the norm z–x Q . 3 ...
View
Full
Document
 Fall '04
 MartinWainwright
 Computer Science, Electrical Engineering, Optimization

Click to edit the document details