1gradient

1gradient - EE236C(Spring 2008-09 1 Gradient method • classical gradient method • convergence analysis • Nesterov’s accelerated gradient

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EE236C (Spring 2008-09) 1. Gradient method • classical gradient method • convergence analysis • Nesterov’s accelerated gradient method • optimality of Nesterov’s method 1–1 Classical gradient method to minimize a convex differentiable function f : choose x (0) and repeat x ( k ) = x ( k − 1) − t k ∇ f ( x ( k − 1) ) , k = 1 , 2 , . . . step size rules • fixed: t k constant • backtracking line search • exact line search: minimize f ( x − t ∇ f ( x )) over t • diminishing: t k → , ∑ ∞ k =1 t k = ∞ we will study fixed and backtracking line search Gradient method 1–2 Backtracking line search initialize t at some positive value ˆ t (for example, ˆ t = 1 ); take t := βt until f ( x − t ∇ f ( x )) < f ( x ) − αt bardbl∇ f ( x ) bardbl 2 2 t f ( x − t ∇ f ( x )) f ( x ) − αt bardbl∇ f ( x ) bardbl 2 2 • < β < 1 ; we will take α = 1 / 2 (mostly to simplify proofs) • variation: use ˆ t = t k − 1 to initialize backtracking at iteration k Gradient method 1–3 Assumptions 1. f has finite optimal value f ⋆ , minimizer x ⋆ 2. f is convex, dom f = R n 3. ∇ f ( x ) is Lipschitz continuous with constant L > : bardbl∇ f ( x ) − ∇ f ( y ) bardbl 2 ≤ L bardbl x − y bardbl 2 ∀ x, y for twice differentiable functions, this means ∇ 2 f ( x ) precedesequal LI for all x Gradient method 1–4 Upper and lower bound f ( y ) ( x, f ( x )) • affine lower bound from convexity f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) ∀ x, y • quadratic upper bound from Lipschitz property f ( y ) ≤ f ( x ) + ∇ f ( x ) T ( y − x ) + L 2 bardbl y − x bardbl 2 2 ∀ x, y Gradient method 1–5 proof of upper bound (define v = y − x ) f ( y ) = f ( x ) + ∇ f ( x ) T v + integraldisplay 1 ( ∇ f ( x + tv ) − ∇ f ( x )) T v dt ≤ f ( x ) + ∇ f ( x ) T v + integraldisplay 1 bardbl∇ f ( x + tv ) − ∇ f ( x ) bardbl 2 bardbl v bardbl 2 dt ≤ f ( x ) + ∇ f ( x ) T v + integraldisplay 1 Lt bardbl v bardbl 2 2 dt = f ( x ) + ∇ f ( x ) T v + L 2 bardbl v bardbl 2 2 Gradient method 1–6 Analysis for constant step size from quadratic upper bound with y = x − t ∇ f ( x ) : f ( x − t ∇ f ( x )) ≤ f ( x ) − t (1 − Lt 2 ) bardbl∇ f ( x ) bardbl 2 2 therefore, if x + = x − t ∇ f ( x ) and < t ≤ 1 /L , f ( x + ) ≤ f ( x ) − t 2 bardbl∇ f ( x ) bardbl 2 2 ≤ f ⋆ + ∇ f ( x ) T ( x − x ⋆...
View Full Document

This note was uploaded on 01/25/2010 for the course EE 236 taught by Professor Staff during the Spring '08 term at UCLA.

Page1 / 12

1gradient - EE236C(Spring 2008-09 1 Gradient method • classical gradient method • convergence analysis • Nesterov’s accelerated gradient

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online