{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lecture19

# lecture19 - CSE 6740 Lecture 19 How Do I Optimize the...

This preview shows pages 1–10. Sign up to view the full content.

CSE 6740 Lecture 19 How Do I Optimize the Parameters? (Unconstrained Optimization) Alexander Gray [email protected] Georgia Institute of Technology CSE 6740 Lecture 19 – p. 1/3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Today 1. Unconstrained Optimization: General 2. Unconstrained Optimization: Sum-of-Squares CSE 6740 Lecture 19 – p. 2/3
Unconstrained Optimization: General Optimizing an objective function without constraints, for general functions. CSE 6740 Lecture 19 – p. 3/3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Optimization Learning or training the parameters of a model generally boils down to optimization. There is a general theory for this which covers many of the cases we need in machine learning. The general form of optimization problem we’ll consider is: Find x * = arg min x R D f ( x ) (1) subject to c i ( x ) = 0 , i = 1 , . . . , M (2) c i ( x ) 0 , i = M + 1 , . . . , N. (3) Maximization and minimization are interchangeable by negating f ( x ) , so we’ll assume we’re always minimizing. CSE 6740 Lecture 19 – p. 4/3
Unconstrained Optimization Cases Root-finding Univariate functions Multivariate smooth functions Second-derivative methods First-derivative methods Non-derivative methods Multivariate non-smooth functions Sums of squares Latent-variable functions CSE 6740 Lecture 19 – p. 5/3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Smooth: Canonical Algorithm Assume for now that f ( x ) is twice differentiable. Let x k be the current estimate of x * . A descent method imposes the descent condition , that on the k th iteration f ( x k ) < f ( x k - 1 ) . We’ll use the shorthand f k f ( x k ) . All of the methods we’ll consider have this general form of iteration: Test for convergence . If the conditions are satisfied, stop and return x k . Compute a search direction . A vector p k . Compute a step length . A scalar α k , such that f ( x k + α k p k ) < f ( x k ) . Update the estimate of the minimum. Set x k +1 = x k + α k p k . CSE 6740 Lecture 19 – p. 6/3
Smooth: Canonical Algorithm Let’s denote the gradient as g ( x ) ≡ ▽ f ( x ) = parenleftbigg ∂f ∂x 1 . . . ∂f ∂x D parenrightbigg T , (4) and the Hessian as H ( x ) ≡ ▽ 2 f ( x ) = 2 f ∂x 2 1 . . . 2 f ∂x 1 x D . . . . . . 2 f ∂x D x 1 . . . 2 f ∂x 2 D . (5) The Hessian matrix of f ( x ) can also be described as the Jacobian matrix of g ( x ) . Higher-order derivatives are rarely needed in standard optimization methods. CSE 6740 Lecture 19 – p. 7/3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Smooth: Canonical Algorithm To satisfy the descent condition, p k and α k must have certain properties. One way to ensure it is to require that p k be a descent direction at x k , i.e. a vector satisfying g T k p k < 0 , (6) roughly because by Taylor’s Theorem, f ( x k + p ) f k + g T k p. (7) The step length α k must have the property that f ( x k + α k p k ) < f ( x k ) . (8) CSE 6740 Lecture 19 – p. 8/3
Convergence Rate It can be shown under some conditions that the canonical algorithm converges, i.e. the sequence { x k } → x * , but the efficiency of the method depends on the number of iterations required.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 35

lecture19 - CSE 6740 Lecture 19 How Do I Optimize the...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online