Stat841f09 - Wiki Course Notes

# These are all trivial except for condition 3 lets

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ucker.pdf) ) give us a closer look into the Lagrangian equation and the associated conditions. Suppose we are looking to minimize 1. At the optimal point, 2. 3. 4. such that . If and are differentiable, then the necessary conditions for to be a local minimum are: ; i.e. . (Dual Feasibility) (Complementary Slackness) (Primal Feasibility) If any of these conditions are violated, then the problem is deemed not feasible. These are all trivial except for condition 3. Let's examine it further in our support vector machine problem. Support Vectors Support vectors are the training points that determine the optimal separating hyperplane that we seek. Also, they are the most difficult points to classify or the most informative for the classification. In our case, the function is: Substituting into KKT condition 3, we get In order for this condition to be satisfied either or wikicour senote.com/w/index.php?title= Stat841&pr intable= yes . 64/74 10/09/2013 All points Stat841 - Wiki Cour se Notes will be either 1 or greater than 1 distance away from the hyperplane. Cas e 1: a point away from the margin If . If point is not on the margin, then the corresponding Cas e 2: a point . away from the margin If If point is on the margin, then the corresponding Points on the margin, with corresponding . , are called support vect ors. Us ing s upport vectors Support vectors are important because they allow the support vector machine algorithm to be insensitive to outliers. If , then the cost function is also 0, and won't contribute to the solution of the SVM problem; only points on the margin — support vectors — contribute. Hence the model given by SVM in entirely defined by the set of support vectors, a subset of the entire training set. This is interesting because in the NN methods(and can be generalize to classical statistical learning) previous to this the configuration of the network needed to be specified. In this case we have a data- driven or 'nonparametric' model in which is the training set and algorithm will determine the support vectors, instead of fitting a set of parameters using CV or other error minimization functions. References: Wang, L, 2005. Support Vector Machines: Theory and Applications, Springer, 3 The s upport ve ctor mach...
View Full Document

## This document was uploaded on 03/07/2014.

Ask a homework question - tutors are online