Chp12 - Copy

# Chp12 - Copy - 12 Support Vector Machines and Flexible...

This preview shows pages 1–3. Sign up to view the full content.

12 Support Vector Machines and Flexible Discriminants 12.1 Introduction In this chapter we describe generalizations of linear decision boundaries for classiﬁcation. Optimal separating hyperplanes are introduced in Chap- ter 4 for the case when two classes are linearly separable. Here we cover extensions to the nonseparable case, where the classes overlap. These tech- niques are then generalized to what is known as the support vector machine , which produces nonlinear boundaries by constructing a linear boundary in a large, transformed version of the feature space. The second set of methods generalize Fisher’s linear discriminant analysis (LDA). The generalizations include ﬂexible discriminant analysis which facilitates construction of non- linear boundaries in a manner very similar to the support vector machines, penalized discriminant analysis for problems such as signal and image clas- siﬁcation where the large number of features are highly correlated, and mixture discriminant analysis for irregularly shaped classes. 12.2 The Support Vector Classiﬁer In Chapter 4 we discussed a technique for constructing an optimal separat- ing hyperplane between two perfectly separated classes. We review this and generalize to the nonseparable case, where the classes may not be separable by a linear boundary. © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 417 DOI: 10.1007/b94608_12,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
418 12. Flexible Discriminants margin M = 1 ± β ± M = 1 ± β ± x T β + β 0 =0 margin ξ 1 ξ 2 ξ 3 ξ 4 ξ 5 M = 1 ± β ± M = 1 ± β ± x T β + β 0 FIGURE 12.1. Support vector classifers. The leFt panel shows the separable case. The decision boundary is the solid line, while broken lines bound the shaded maximal margin oF width 2 M =2 / ± β ± . The right panel shows the nonseparable (overlap) case. The points labeled ξ j are on the wrong side oF their margin by an amount ξ j = j ; points on the correct side have ξ j . The margin is maximized subject to a total budget P ξ i constant. Hence P ξ j is the total distance oF points on the wrong side oF their margin. Our training data consists of N pairs ( x 1 ,y 1 ) , ( x 2 2 ) ,..., ( x N N ), with x i IR p and y i ∈{− 1 , 1 } . DeFne a hyperplane by { x : f ( x )= x T β + β 0 } , (12.1) where β is a unit vector: ± β ± = 1. A classiFcation rule induced by f ( x )is G ( x ) = sign[ x T β + β 0 ] . (12.2) The geometry of hyperplanes is reviewed in Section 4.5, where we show that f ( x ) in (12.1) gives the signed distance from a point x to the hyperplane f ( x x T β + β 0 = 0. Since the classes are separable, we can Fnd a function f ( x x T β + β 0 with y i f ( x i ) > 0 i . Hence we are able to Fnd the hyperplane that creates the biggest margin between the training points for class 1 and 1 (see ±igure 12.1). The optimization problem max β,β 0 , ± β ± =1 M subject to y i ( x T i β + β 0 ) M, i =1 ,...,N, (12.3) captures this concept. The band in the Fgure is
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 07/14/2010 for the course STAT 132 taught by Professor Haulk during the Spring '10 term at UBC.

### Page1 / 42

Chp12 - Copy - 12 Support Vector Machines and Flexible...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online