This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Stochastic Subgradient Methods Stephen Boyd and Almir Mutapcic Notes for EE364b, Stanford University, Winter 200607 April 13, 2008 1 Noisy unbiased subgradient Suppose f : R n R is a convex function. We say that a random vector g R n is a noisy (unbiased) subgradient of f at x dom f if g = E g f ( x ), i.e. , we have f ( z ) f ( x ) + ( E g ) T ( z x ) for all z . Thus, g is a noisy unbiased subgradient of f at x if it can be written as g = g + v , where g f ( x ) and v has zero mean. If x is also a random variable, then we say that g is a noisy subgradient of f at x (which is random) if z f ( z ) f ( x ) + E ( g  x ) T ( z x ) holds almost surely. We can write this compactly as E ( g  x ) f ( x ). (Almost surely is to be understood here.) The noise can represent (presumably small) error in computing a true subgradient, er ror that arises in Monte Carlo evaluation of a function defined as an expected value, or measurement error. Some references for stochastic subgradient methods are [Sho98, 2.4], [Pol87, Chap. 5]. Some books on stochastic programming in general are [BL97, Pre95, Mar05]. 2 Stochastic subgradient method The stochastic subgradient method is essentially the subgradient method, but using noisy subgradients and a more limited set of step size rules. In this context, the slow convergence of subgradient methods helps us, since the many steps help average out the statistical errors in the subgradient evaluations. Well consider the simplest case, unconstrained minimization of a convex function f : R n R . The stochastic subgradient method uses the standard update x ( k +1) = x ( k ) k g ( k ) , 1 where x ( k ) is the k th iterate, k > 0 is the k th step size, and g ( k ) is a noisy subgradient of f at x ( k ) , E ( g ( k )  x ( k ) ) = g ( k ) f ( x ( k ) ) . Even more so than with the ordinary subgradient method, we can have f ( x ( k ) ) increase during the algorithm, so we keep track of the best point found so far, and the associated function value f ( k ) best = min { f ( x (1) ) , . . ., f ( x ( k ) ) } . The sequences x ( k ) , g ( k ) , and f ( k ) best are, of course, stochastic processes. 3 Convergence Well prove a very basic convergence result for the stochastic subgradient method, using step sizes that are squaresummable but not summable, k , summationdisplay k =1 2 k = bardbl bardbl 2 2 < , summationdisplay k =1 k = . We assume there is an x that minimizes f , and a G for which E bardbl g ( k ) bardbl 2 2 G 2 for all k . We also assume that R satisfies E bardbl x (1) x bardbl 2 2 R 2 ....
View
Full
Document
This note was uploaded on 04/09/2010 for the course EE 364B at Stanford.
 '09
 BOYD,S

Click to edit the document details