Stat841f09 - Wiki Course Notes

# Mathematically the problem becomes now each data

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ind an optimally separating hyperplane of two separable classes of data, in which case the margin contains no data points. However, in the real world, data of different classes are usually mixed together at the boundary and it's hard to find a perfect boundary to totally separate them. To address this problem, we slacken the classification rule to allow data cross the margin. Mathematically the problem becomes, Now each data point can have some error . However, we only want data to cross the boundary when they have to and make the minimum sacrifice; thus, a penalty term is added correspondingly in the objective function to constrain the number of points that cross the margin. The optimization problem now becomes: Note that is not necessarily smaller than one, which means data can not only enter the margin but can also cross the separating hyperplane. Note that more separable. is feasible in the separable case, as all . In general, for higher , the sets are Aside: More information about SVM and Kernal SVMs are currently among the best performers for many benchmark datasets and have been extended to a Figure non-separable case number of tasks such as regression. It seems the kernel trick is the most attracting site of SVMs. This idea has now been applied to many other learning models where the inner- product is concerned, and they are called ‘kernel’ methods. Tuning SVMs remains to be the main research focus: how to an optimal kernel? Kernel should match the smooth structure of the data. Support Vector Machine algorithm for non-separable cases - November 23, 2009 With the program formulation above, we can form the lagrangian, apply KKT conditions, and come up with a new function to optimize. As we will see, the equation that we will attempt to optimize in the SVM algorithm for non- separable data sets is the same as the optimization for the separable case, with slightly different conditions. Forming the Lagrangian In this case we have have two constraints in the Lagrangian (http://en.wikipedia.org/wiki/Lagrangian) and therefor...
View Full Document

Ask a homework question - tutors are online