7bregman

# 7bregman - EE236C(Spring 2008-09 7 Gradient methods with...

This preview shows pages 1–4. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EE236C (Spring 2008-09) 7. Gradient methods with generalized distances • Bregman distances • variant of Nesterov’s method • example 7–1 Gradient method and extension basic gradient method for minimizing f (lecture 1) x + = argmin z parenleftbigg f ( x ) + ∇ f ( x ) T ( z − x ) + 1 2 t bardbl z − x bardbl 2 2 parenrightbigg extension for minimizing f + g over C (lectures 4-5) x + = argmin z ∈ C parenleftbigg f ( x ) + ∇ f ( x ) T ( z − x ) + 1 2 t bardbl z − x bardbl 2 2 + g ( z ) parenrightbigg Δ = S t ( x − t ∇ f ( X )) • g a simple nondifferentiable function; C a simple convex set • interesting if projection/thresholding operation S t is inexpensive Gradient methods with generalized distances 7–2 Generalization replace (1 / 2) bardbl z − x bardbl 2 2 with ‘generalized distance function’ d ( z, x ) • basic gradient update argmin z parenleftbigg f ( x ) + ∇ f ( x ) T ( z − x ) + 1 t d ( z, x ) parenrightbigg • extension with projection/thresholding argmin z ∈ C parenleftbigg f ( x ) + ∇ f ( x ) T ( z − x ) + 1 t d ( z, x ) + g ( z ) parenrightbigg potential benefits • select d ( z, x ) to fit the curvature of f , or geometry of C • simplify the thresholding/projection Gradient methods with generalized distances 7–3 Bregman distance functions Bregman distance associated with strictly convex, differentiable h : d ( x, y ) = h ( x ) − h ( y ) − ∇ h ( y ) T ( x − y ) h is called the kernel function of d properties • convex in x for fixed y • d ( x, y ) ≥ for all x, y ; d ( x, y ) = 0 if and only if x = y • not a real distance (not symmetric) • d ( x, y ) ≥ ( μ/ 2) bardbl x − y bardbl 2 2 if h is strongly convex with constant μ first two properties follow from (strict) convexity of h Gradient methods with generalized distances 7–4 Examples quadratic function: h ( x ) = bardbl x bardbl 2 2 / 2 d ( x, y ) = 1 2 bardbl x − y bardbl 2 2 negative entropy: h ( x ) = ∑ n i =1 x i log x i with dom h = R n ++ d ( x, y ) = n summationdisplay i =1 ( x i log( x i /y i ) − x i + y i ) the relative entropy or Kullback-Leibler divergence Gradient methods with generalized distances 7–5 logarithm barrier: h ( x ) = − ∑ n i =1 log x i with dom h = R n ++ d ( x, y ) = n summationdisplay i =1 ( x i /y i − log( x i /y i )) − n inverse barrier: h ( x ) = ∑ n i =1 1 /x i with...
View Full Document

{[ snackBarMessage ]}

### Page1 / 11

7bregman - EE236C(Spring 2008-09 7 Gradient methods with...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online