Course Hero Logo

IE598-lecture16-smoothing-techniques-I.pdf - IE 598: Big...

Course Hero uses AI to attempt to automatically extract content from documents to surface to you and others so you can study better, e.g., in search results, to enrich docs, and more. This preview shows page 1 - 2 out of 6 pages.

IE 598: Big Data OptimizationFall 2016Lecture 16: Smoothing Techniques I – October 18Lecturer: Niao HeScribers: Harsh GuptaOverview.We discussed Subgradient Descent and Mirror Descent algorithms for non-smooth convex op-timization in the past week. We observed that Subgradient Descent is a special case of the Mirror Descentalgorithm. But, both these algorithms have general formulations and don’t exploit the structure of the prob-lem at hand. In practice, we always know some thing about the structure of the optimization problem weintend to solve. One can then utilize this structure to come up with more efficient algorithms as comparedto Subgradient Descent and Mirror Descent algorithms.16.1IntroductionWe intend to solve the following optimization problem:minxXf(x)(16.1)wherefis a convex but non-smooth, i.e., non-differentiable function, andXis a convex compact set. Oneintuitive way to approach the above problem is to approximate the non-smooth functionf(x) by a smoothand convex functionfμ(x), so that we can use the standard techniques learnt so far in the course to solvethe problem. Hence, we want to reduce the problem in (16.1) to the following:minxXfμ(x)(16.2)wherefμ(x) is aLμ-Lipschitz continuous, smooth and convex approximation of the functionf(x).Nowwe can use the techniques learnt earlier in this course like gradient descent, accelerated gradient descent,Frank Wolfe algorithm, coordinate descent etc., to solve the above problem. Clearly, the objective now is tocome up with a reasonably good approximationfμoffso that solving (16.2) is as close to solving (16.1) aspossible.A motivation example:Consider the simplest non-smooth and convex function,f(x) =|x|. The followingfunction, known as theHuber functionfμ(x) =(x22μ,|x| ≤μ|x| -μ2,|x|> μ(16.3)is a smooth approximation of the absolute value function. We plot the two functions (forμ= 1) in Figure1. We make the following observations:1.fμ(x) is clearly continuous and differentiable everywhere. This can be seen straightforwardly from itsformulation given in (16.3).2. We observe thatfμ(x)f(x). Also,fμ(x)≥ |x| -μ2, therefore:f(x)-μ2fμ(x)f(x)Hence, ifμ0, thenfμ(x)f(x). Therefore,μcharacterizes the approximation accuracy.3. We also observe that|f00μ(x)| ≤1μ. This implies thatfμ(x) is1μ-Lipschitz continuous.The Hubert function approximation has been widely used in machine learning to approximate non-smoothloss functions, e.g. absolute loss (robust regression), hinge loss (SVM), etc.

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 6 pages?

Upload your study docs or become a

Course Hero member to access this document

Term
Summer
Professor
NoProfessor
Tags
Derivative, Optimization, Max, Continuous function, Nesterov

Newly uploaded documents

Show More

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture