{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# 1gradient - EE236C(Spring 2008-09 1 Gradient method...

This preview shows pages 1–5. Sign up to view the full content.

EE236C (Spring 2008-09) 1. Gradient method classical gradient method convergence analysis Nesterov’s accelerated gradient method optimality of Nesterov’s method 1–1 Classical gradient method to minimize a convex differentiable function f : choose x (0) and repeat x ( k ) = x ( k 1) t k f ( x ( k 1) ) , k = 1 , 2 , . . . step size rules fixed: t k constant backtracking line search exact line search: minimize f ( x t f ( x )) over t diminishing: t k 0 , k =1 t k = we will study fixed and backtracking line search Gradient method 1–2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Backtracking line search initialize t at some positive value ˆ t (for example, ˆ t = 1 ); take t := βt until f ( x t f ( x )) < f ( x ) αt bardbl∇ f ( x ) bardbl 2 2 t f ( x t f ( x )) f ( x ) αt bardbl∇ f ( x ) bardbl 2 2 0 < β < 1 ; we will take α = 1 / 2 (mostly to simplify proofs) variation: use ˆ t = t k 1 to initialize backtracking at iteration k Gradient method 1–3 Assumptions 1. f has finite optimal value f , minimizer x 2. f is convex, dom f = R n 3. f ( x ) is Lipschitz continuous with constant L > 0 : bardbl∇ f ( x ) − ∇ f ( y ) bardbl 2 L bardbl x y bardbl 2 x, y for twice differentiable functions, this means 2 f ( x ) precedesequal LI for all x Gradient method 1–4
Upper and lower bound f ( y ) ( x, f ( x )) affine lower bound from convexity f ( y ) f ( x ) + f ( x ) T ( y x ) x, y quadratic upper bound from Lipschitz property f ( y ) f ( x ) + f ( x ) T ( y x ) + L 2 bardbl y x bardbl 2 2 x, y Gradient method 1–5 proof of upper bound (define v = y x ) f ( y ) = f ( x ) + f ( x ) T v + integraldisplay 1 0 ( f ( x + tv ) − ∇ f ( x )) T v dt f ( x ) + f ( x ) T v + integraldisplay 1 0 bardbl∇ f ( x + tv ) − ∇ f ( x ) bardbl 2 bardbl v bardbl 2 dt f ( x ) + f ( x ) T v + integraldisplay 1 0 Lt bardbl v bardbl 2 2 dt = f ( x ) + f ( x ) T v + L 2 bardbl v bardbl 2 2 Gradient method 1–6

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Analysis for constant step size from quadratic upper bound with y = x t f ( x ) : f ( x t f ( x )) f ( x ) t (1 Lt 2 ) bardbl∇ f ( x ) bardbl 2 2 therefore, if x + = x t f ( x ) and 0 < t 1 /L , f ( x + ) f ( x ) t 2 bardbl∇ f ( x ) bardbl 2 2 f + f ( x ) T ( x x ) t 2 bardbl∇ f ( x ) bardbl 2 2 = f + 1 2 t ( bardbl x x bardbl 2 2 − bardbl x x t f ( x ) bardbl 2 2 ) = f + 1 2 t ( bardbl x x bardbl 2 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}