lecture8

lecture8 - CSE 6740 Lecture 8 How Do I Predict a Discrete...

This preview shows pages 1–8. Sign up to view the full content.

CSE 6740 Lecture 8 How Do I Predict a Discrete Variable? II (Classification) Alexander Gray [email protected] Georgia Institute of Technology CSE 6740 Lecture 8 – p. 1/2 4

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Today 1. More classification methods (How can I predict a discrete variable?) CSE 6740 Lecture 8 – p. 2/2 4
Support Vector Machine Now let’s choose a different criterion. Let’s find the hyperplane which maximizes the distance of the closest point from either class. We call this distance the margin . Points on the margin are called support vectors . Let’s begin by assuming the classes are linearly separable. CSE 6740 Lecture 8 – p. 3/2 4

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Support Vector Machine The hyperplane which maximizes the margin is given by finding max β 0 m subject to 1 || β || y i ( β 0 + β T x i ) m, i. (1) Equivalently the constraints can be written as y i ( β 0 + β T x i ) m || β || . Since for any β 0 and β satisfying these inequalities, any positively scaled multiple satisfies them too, we can arbitrarily set || β || = 1 /m . CSE 6740 Lecture 8 – p. 4/2 4
Support Vector Machine Thus the optimization problem is equivalent to minimizing 1 2 || β || subject to y i ( β 0 + β T x i ) 1 , i. (2) It turns out this optimization problem is a quadratic programming problem (quadratic objective function with linear constraints), a standard type of optimization problem for which methods exist for finding the global optimum. The theory of convex optimization tells us there is an equivalent way to write this optimization problem (its dual formulation ). CSE 6740 Lecture 8 – p. 5/2 4

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Support Vector Machine Let g ( x ) denote the optimal (maximum margin) hyperplane. Let a x i x i A denote the inner product of x i and x i . Then β j = N s i =1 α i y i x ij (3) where α is the vector of weights that maximizes N s i =1 α i 1 2 N s i =1 N s i =1 α i α i y i y i a x i x i A (4) subject to α i 0 and s i α i y i = 0 . (5) CSE 6740 Lecture 8 – p. 6/2 4
Support Vector Machine However, for realistic problems we must relax the assumption that the classes are linearly separable. In the primal formulation, instead of minimizing 1 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/03/2010 for the course CSE 6740 taught by Professor Staff during the Fall '08 term at Georgia Tech.

Page1 / 24

lecture8 - CSE 6740 Lecture 8 How Do I Predict a Discrete...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online