This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 5 Optimization Optimization plays an increasingly important role in machine learning. For instance, many machine learning algorithms minimize a regularized risk functional: min f J ( f ) := ( f ) + R emp ( f ) (5.1) with the empirical risk R emp ( f ) := 1 m m i =1 l ( f ( x i ) y i ) . (5.2) Here x i are the training instances and y i are the corresponding labels. l the loss function measures the discrepancy between y and the predictions f ( x i ). Finding the optimal f involves solving an optimization problem. This chapter provides a selfcontained overview of some basic concepts and tools from optimization, especially geared towards solving machine learning problems. In terms of concepts, we will cover topics related to convexity, duality, and Lagrange multipliers. In terms of tools, we will cover a variety of optimization algorithms including gradient descent, stochastic gradient descent, Newtons method, and QuasiNewton methods. We will also look at some specialized algorithms tailored towards solving Linear Programming and Quadratic Programming problems which often arise in machine learning problems. 5.1 Preliminaries Minimizing an arbitrary function is, in general, very dicult, but if the ob jective function to be minimized is convex then things become considerably simpler. As we will see shortly, the key advantage of dealing with convex functions is that a local optima is also the global optima. Therefore, well developed tools exist to find the global minima of a convex function. Conse quently, many machine learning algorithms are now formulated in terms of convex optimization problems. We briey review the concept of convex sets and functions in this section. 129 130 5 Optimization 5.1.1 Convex Sets Definition 5.1 Convex Set) A subset C of R n is said to be convex if (1 ) x + y C whenever x C y C and < < 1 . Intuitively, what this means is that the line joining any two points x and y from the set C lies inside C (see Figure 5.1 ). It is easy to see (Exercise 5.1 ) that intersections of convex sets are also convex. Fig. 5.1. The convex set (left) contains the line joining any two points that belong to the set. A nonconvex set (right) does not satisfy this property. A vector sum i i x i is called a convex combination if i 0 and i i = 1. Convex combinations are helpful in defining a convex hull: Definition 5.2 Convex Hull) The convex hull, conv( X ) , of a finite sub set X = { x 1 ... x n } of R n consists of all convex combinations of x 1 ... x n . 5.1.2 Convex Functions Let f be a real valued function defined on a set X R n . The set { ( x ) : x X R f ( x ) } (5.3) is called the epigraph of f . The function f is defined to be a convex function if its epigraph is a convex set in R n +1 . An equivalent, and more commonly used, definition (Exercise 5.5 ) is as follows (see Figure 5.2 for geometric intuition): Definition 5.3 Convex Function) A function f defined on a set...
View
Full
Document
This note was uploaded on 02/23/2012 for the course STAT 598 taught by Professor Staff during the Spring '08 term at Purdue UniversityWest Lafayette.
 Spring '08
 Staff

Click to edit the document details