4nsmooth

4nsmooth - EE236C (Spring 2008-09) 4. Gradient methods for...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EE236C (Spring 2008-09) 4. Gradient methods for nonsmooth problems • motivation • example: 1-norm regularization • gradient mapping • gradient method • Nesterov’s gradient method • examples 4–1 Motivation complexity results from previous lectures bounds on number of iterations to reach accuracy f ( x ) − f ⋆ ≤ ǫ : • subgradient method: O (1 /ǫ 2 ) • gradient method: O (1 /ǫ ) • Nesterov’s optimal gradient method: O (1 / √ ǫ ) can the faster gradient methods be extended to nonsmooth problems? • no, if we consider the problem class and the (oracle) algorithm model for which the subgradient method is known to be optimal • yes, if we can take advantage of additional structure in the problem Gradient methods for nonsmooth problems 4–2 Interpretation of gradient update recall the gradient update for convex differentiable f : x + = x − t ∇ f ( x ) interpretation x + = argmin z parenleftbigg f ( x ) + ∇ f ( x ) T ( z − x ) + 1 2 t bardbl z − x bardbl 2 2 parenrightbigg x + minimizes a quadratic approximation of f , consisting of • the first-order linearization f ( x ) + ∇ f ( x ) T ( z − x ) of f ( z ) at x • a proximity term bardbl z − x bardbl 2 2 with weight 1 / (2 t ) Gradient methods for nonsmooth problems 4–3 Extension to nondifferentiable problems split f in a smooth and a nonsmooth component: minimize f ( x ) = g ( x ) + h ( x ) g convex, differentiable; h convex, nondifferentiable generalized gradient update x + = argmin z parenleftbigg g ( x ) + ∇ g ( x ) T ( z − x ) + 1 2 t bardbl z − x bardbl 2 2 + h ( z ) parenrightbigg • we make a quadratic approximation to g only • complexity of computing x + depends on structure of h repeating the update provides a ‘gradient method’ for minimizing f Gradient methods for nonsmooth problems 4–4 Example: 1-norm regularization minimize f ( x ) = g ( x ) + bardbl x bardbl 1 generalized gradient update x + = argmin z parenleftbigg g ( x ) + ∇ g ( x ) T ( z − x ) + 1 2 t bardbl z − x bardbl 2 2 + bardbl z bardbl 1 parenrightbigg = argmin z parenleftbigg 1 2 t bardbl z − x + t ∇ g ( x ) bardbl 2 2 + bardbl z bardbl 1 parenrightbigg = S t ( x − t ∇ g ( x )) where S t ( y ) Δ = argmin z parenleftbigg 1 2 t bardbl z − y bardbl 2 2 + bardbl z bardbl 1 parenrightbigg Gradient methods for nonsmooth problems 4–5 computing S t : solve a simple separable problem in z ∈ R n minimize n summationdisplay k =1 parenleftbigg 1 2 t ( z k − y k ) 2 + | z k | parenrightbigg solution: S t ( y ) k = y k − t y k ≥ t −...
View Full Document

This note was uploaded on 01/25/2010 for the course EE 236 taught by Professor Staff during the Spring '08 term at UCLA.

Page1 / 13

4nsmooth - EE236C (Spring 2008-09) 4. Gradient methods for...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online