04-stoch_subgrad_notes

04-stoch_subgrad_notes - Stochastic Subgradient Methods...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Stochastic Subgradient Methods Stephen Boyd and Almir Mutapcic Notes for EE364b, Stanford University, Winter 2006-07 April 13, 2008 1 Noisy unbiased subgradient Suppose f : R n R is a convex function. We say that a random vector g R n is a noisy (unbiased) subgradient of f at x dom f if g = E g f ( x ), i.e. , we have f ( z ) f ( x ) + ( E g ) T ( z x ) for all z . Thus, g is a noisy unbiased subgradient of f at x if it can be written as g = g + v , where g f ( x ) and v has zero mean. If x is also a random variable, then we say that g is a noisy subgradient of f at x (which is random) if z f ( z ) f ( x ) + E ( g | x ) T ( z x ) holds almost surely. We can write this compactly as E ( g | x ) f ( x ). (Almost surely is to be understood here.) The noise can represent (presumably small) error in computing a true subgradient, er- ror that arises in Monte Carlo evaluation of a function defined as an expected value, or measurement error. Some references for stochastic subgradient methods are [Sho98, 2.4], [Pol87, Chap. 5]. Some books on stochastic programming in general are [BL97, Pre95, Mar05]. 2 Stochastic subgradient method The stochastic subgradient method is essentially the subgradient method, but using noisy subgradients and a more limited set of step size rules. In this context, the slow convergence of subgradient methods helps us, since the many steps help average out the statistical errors in the subgradient evaluations. Well consider the simplest case, unconstrained minimization of a convex function f : R n R . The stochastic subgradient method uses the standard update x ( k +1) = x ( k ) k g ( k ) , 1 where x ( k ) is the k th iterate, k > 0 is the k th step size, and g ( k ) is a noisy subgradient of f at x ( k ) , E ( g ( k ) | x ( k ) ) = g ( k ) f ( x ( k ) ) . Even more so than with the ordinary subgradient method, we can have f ( x ( k ) ) increase during the algorithm, so we keep track of the best point found so far, and the associated function value f ( k ) best = min { f ( x (1) ) , . . ., f ( x ( k ) ) } . The sequences x ( k ) , g ( k ) , and f ( k ) best are, of course, stochastic processes. 3 Convergence Well prove a very basic convergence result for the stochastic subgradient method, using step sizes that are square-summable but not summable, k , summationdisplay k =1 2 k = bardbl bardbl 2 2 < , summationdisplay k =1 k = . We assume there is an x that minimizes f , and a G for which E bardbl g ( k ) bardbl 2 2 G 2 for all k . We also assume that R satisfies E bardbl x (1) x bardbl 2 2 R 2 ....
View Full Document

This note was uploaded on 04/09/2010 for the course EE 364B at Stanford.

Page1 / 14

04-stoch_subgrad_notes - Stochastic Subgradient Methods...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online