IntroductionToSVM

# IntroductionToSVM - Introduction to Support Vector Machines...

This preview shows pages 1–15. Sign up to view the full content.

Introduction to Support Vector Machines Colin Campbell, Bristol University 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classiﬁcation. 1.2. Soft margins and multi-class classiﬁcation. 1.3. SVMs for regression. 2
Part 2. General kernel methods 2.1 Kernel methods based on linear programming and other approaches. 2.2 Training SVMs and other kernel machines. 2.3 Model Selection. 2.4 Diﬀerent types of kernels. 2.5. SVMs in practice 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Advantages of SVMs ˆ A principled approach to classiﬁcation, regression or novelty detection tasks. ˆ SVMs exhibit good generalisation. ˆ Hypothesis has an explicit dependence on the data (via the support vectors). Hence can readily interpret the model. 4
ˆ Learning involves optimisation of a convex function (no false minima, unlike a neural network). ˆ Few parameters required for tuning the learning machine (unlike neural network where the architecture and various parameters must be found). ˆ Can implement conﬁdence measures, etc. 5

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
1.1 SVMs for Binary Classiﬁcation. Preliminaries : Consider a binary classiﬁcation problem: input vectors are x i and y i = ± 1 are the targets or labels . The index i labels the pattern pairs ( i = 1 ,...,m ). The x i deﬁne a space of labelled points called input space . 6
From the perspective of statistical learning theory the motivation for considering binary classiﬁer SVMs comes from theoretical bounds on the generalization error. These generalization bounds have two important features: 7

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
1. the upper bound on the generalization error does not depend on the dimensionality of the space. 2. the bound is minimized by maximizing the margin , γ , i.e. the minimal distance between the hyperplane separating the two classes and the closest datapoints of each class. 8
Separating hyperplane Margin 9

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
In an arbitrary-dimensional space a separating hyperplane can be written: w · x + b = 0 where b is the bias , and w the weights, etc. Thus we will consider a decision function of the form: D ( x ) = sign( w · x + b ) 10
We note that the argument in D ( x ) is invariant under a rescaling: w λ w , b λb . We will implicitly ﬁx a scale with: 11

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
w · x + b = 1 w · x + b = - 1 for the support vectors ( canonical hyperplanes ). 12
Thus: w · ( x 1 - x 2 ) = 2 For two support vectors on each side of the separating hyperplane. 13

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
( x 1 - x 2 ) onto the normal vector to the hyperplane i.e. w / || w || from which we deduce that the margin is given by γ = 1 / || w || 2 . 14
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 06/16/2011 for the course CS 5141 taught by Professor Chenenhong during the Spring '10 term at USTC.

### Page1 / 64

IntroductionToSVM - Introduction to Support Vector Machines...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online