svmtalk - 1 Support Vector Machines Based on ESL and papers...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Support Vector Machines Based on ESL and papers by Vladimir Vapnik, Trevor Hastie, Saharon Rosset, Rob Tibshirani, Ji Zhu 2 Outline • Optimal separating hyperplanes and relaxations • SVMs and kernel inner-products • SVM as a function estimation problem • LARS- style algorithm for SVMs 3 Maximum Margin Classifier PSfrag replacements ∗ ξ1 xT β + β 0 = 0 • • • •• • •• •• • • • •• •• • •• Vapnik(1995) xi ∈ IRp , yi ∈ {−1, 1} ∗ ξ2 ∗ ξ3 ∗ ξ4 ∗ ξ5 C margin C β,β0 , β =1 max C subject to yi (xT β + β0 ) ≥ C, i = 1, . . . , N. i 4 Overlapping Classes PSfrag replacements xT β + β 0 = 0 • ξ4 ∗ • •• • ∗ •• • ∗ •ξ ∗• ξ3 1 • •∗ •• ξ2 • •• • • ξ5 • •• C “soft” margin C • ∗ ξi = Cξi β,β0 , β =1 max C i ξi subject to yi (xT β + β0 ) ≥ C (1 − ξi ), ξi ≥ 0, i ≤B 5 Equivalent form of problem Define C = 1/||β || and drop norm constraint on β , as in ESL sec 4.2: min ||β || subject to yi (xT β + β0 ) ≥ 1 − ξi , ξi ≥ 0, i i ξi ≤B This is the original form given by Vapnik; we find it confusing due to the fixed scale“1” in the constraint. 6 Example ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ....... ..................................................................... ..................................................................... ...... ..................................................................... ..................................................................... ....... ..................................................................... ..................................................................... ...... ..................................................................... ..................................................................... ...... ..................................................................... ... ..................................................................... ..................................................................... ....... ..................................................................... ..................................................................... ...... ..................................................................... ... ..................................................................... ..................................................................... ....... ..................................................................... .......... ..................................................................... .................................................................... . ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... . . . . . . . . . . Error:. 0.270. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Training ..................................................................... ..................................................................... ..................................................................... ..................................................................... .. ..... . . . . . . Error: . . . . 0.288. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test ..................................................................... ..................................................................... ..................................................................... ............................................................... ............... ..... oo o o o oo o oo o oo o o o o o o o oo o o oo o o oo o ooo o o o oo o oo o o o o o• o oo o o oo oo • ooooo oooo oo oo o o o o o o oo o o o o oo o o oo o o oo o o o oo o o o o o o o oo o oo o o o o oo o o o o oo o o o o o o o o oo o o o o oo o o o o •o oo oo o o o o oo o oo o o o oo o o ooo oo o o o oo o oo o o oo o oo o o oo o o o o Bayes Error: 0.210 o o oo o o PSfrag replacements o ˆ ˆˆ Fitted function is f (x) = xT β + β0 ˆ ˆ Resulting classifier is G(x) = sign[f (x)] 7 Quadratic Programming Solution After a lot of *stuff* we arrive at a Lagrange dual LD 1 = αi − 2 i=1 N N N α i α i y i y i xT xi i i=1 i =1 which we maximize subject to constraints (involving B as well). The solution is expressed in terms of fitted Lagrange multipliers αi : ˆ N ˆ β= i=1 α i y i xi ˆ Some fraction of αi are exactly zero (from KKT conditions); the xi ˆ for which αi > 0 are called support points S . ˆ ˆ ˆˆ f (x) = xT β + β 0 = i∈S ˆ α i y i xT xi + β 0 ˆ 8 Microarray example 16, 063 genes; 144 training, 54 test samples, 14 classes 9 Methods CV errors (SE) Out of 144 Test errors Out of 54 Number of Genes Used 1. Nearest shrunken centroids 2. L2 -penalized discriminant analysis 3. Support vector classifier 4. Lasso regression (one vs all) 5. k -nearest neighbors 6. L2 -penalized multinomial 7. L1 -penalized multinomial 8. Elastic-net penalized multinomial 35 (5.0) 25 (4.1) 26 (4.2) 30.7 (1.8) 41 (4.6) 26 (4.2) 17 (2.8) 22 (3.7) 17 12 14 12.5 26 15 13 11.8 6,520 16,063 16,063 1,429 16,063 16,063 269 384 10 Flexible Classifiers SVM - Degree-4 Polynomial in Feature Space ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ...................................... ..................................................................... ..................................................................... .. . ..................................................................... ..................................................................... ..................................................................... . ..................................................................... . ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... . ..................................................................... . ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... . . . . . . . . . . Error:. 0.180. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Training ..................................................................... ..................................................................... ..................................................................... ..................................................................... .. . . . . . . . Error: . . . . 0.245. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test ..................................................................... ..................................................................... ..................................................................... ..................... ......... ..... oo •o o o oo o oo o • • oo o o •o o ooo o o oo o oo o oo o ooo oo o o o o o oo o oo o o o o oo •o oo o oo o o • o oo o ooo oo oo oo •oooo o o o• o oo o ooo o o ooo o oo o • oo o o o o o oo o oo o o o o oo o o • oo oo o oo o o o o o o• o oo o o o o o o oo o oo o oo o o o oo o oo o oo o o o o • o oo o o •o o o oo o oo o o oo oo oo o oo o o o o Bayes Error: 0.210 o • o oo o • o o Enlarge the feature space via basis expansions, e.g. polynomials of total degree 4. h(x) = (h1 (x), h2 (x), . . . , hM (x)) ˆ ˆˆ f (x) = h(x)T β + β0 11 The kernel trick • Consider the ridge regression prediction ˆ y = X(XT X + λIp )−1 XT y If p is large, computations with XT X + λI (p × p matrix) may be daunting. Instead write this as ˆ y = (XXT + λIN )−1 XXT y • matrix is now only N × N , and if N is small, this is much simpler computationally. • note that we need only to compute inner products of the observations, to compute XXT This argument also applies if X represents not the original features but transformations of them. Hence we only need to define inner products in the transformed space; we don’t need to actually transform the features! 12 SVM and Kernels 1 = αi − 2 i=1 N N N LD αi αi yi yi h(xi ), h(xi ) i=1 i =1 f (x) = h(x)T β + β0 N = i=1 αi yi h(x), h(xi ) + β0 . LD and solution f (x) involve h(x) only through inner-products K (x, x ) = h(x), h(x ) Given a suitable positive kernel K (x, x ), don’t need h(x) at all! ˆ f (x) = i∈S ˆ αi yi K (x, xi ) + β0 ˆ 13 Popular Kernels K (x, x ) is a symmetric, positive (semi-)definite function. dth deg. poly.: K (x, x ) = (1 + x, x )d radial basis: K (x, x ) = exp(− x − x Example: 2nd degree polynomial in IR2 . K (x, x ) = (1 + x, x )2 = (1 + x1 x1 + x2 x2 )2 = 1 + 2x1 x1 + 2x2 x2 + (x1 x1 )2 + (x2 x2 )2 + 2x1 x1 x2 x2 Then M = 6, and if we choose √ √ h1 (x) = 1, h2 (x) = 2x1 , h3 (x) = 2x2 , h4 (x) = x2 , h5 (x) = x2 , 1 2 √ and h6 (x) = 2x1 x2 , then K (x, x ) = h(x), h(x ) . 2 /c) 14 SVM - Radial Kernel in Feature Space ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... . ................. ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ....... ..................................................................... ..................................................................... .. . ..................................................................... ..................................................................... . . ..................................................................... ..................................................................... .. ..................................................................... ..................................................................... . ..................................................................... .. ..................................................................... . ..................................................................... ..................................................................... . . ..................................................................... ..................................................................... .. ..................................................................... . . ..................................................................... ..................................................................... . .. ..................................................................... ..................................................................... .. .... ..................................................................... ..................................................................... . ..................................................................... . ..................................................................... ..................................................................... . ..................................................................... ..................................................................... . ..................................................................... ..................................................................... . ..................................................................... ..................................................................... . ..................................................................... ..................................................................... . ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... ..................................................................... . ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... ..................................................................... . . . . . . . . . . Error:. 0.160. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Training ..................................................................... ..................................................................... ..................................................................... ..................................................................... . . . . . . Error: . . . . 0.218. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test ..................................................................... ..................................................................... ..................................................................... ... ......... ..... .. ..... Dim h(x) infinite oo •o o • o oo o oo o • oo o o •o o ooo o o oo •o o oo • oo o ooo oo o o o o o oo o o o• • o o o o oo •o o o oooo o oo o o o oo oo • o oo o •o o o o •o o o o oooo o o o • oo o o o o oo o • o oo o ooo oooo o o o • o • oo o o • oo oo o oo o o o o o oo o•o o • o o o o o oo o oo o oo o o o oo o oo o oo o o• • o o • o oo o o •o o oo • ooo o o o o o oo o o o • oo o o o • • o • Bayes Error: 0.210 o • o • oo o o o • • Fraction of support points depends on overlap; here 45%. • The smaller B , the smaller the overlap, and more wiggly the function. • B controls generalization error. 15 Curse of Dimensionality Support Vector Machines can suffer in high dimensions. Test Error (SE) Method 1 2 3 4 5 6 SV Classifier SVM/poly 2 SVM/poly 5 SVM/poly 10 BRUTO MARS Bayes No Noise Features 0.450 (0.003) 0.078 (0.003) 0.180 (0.004) 0.230 (0.003) 0.084 (0.003) 0.156 (0.004) 0.029 Six Noise Features 0.472 (0.003) 0.152 (0.004) 0.370 (0.004) 0.434 (0.002) 0.090 (0.003) 0.173 (0.005) 0.029 The addition of 6 noise features to the 4-dimensional feature space causes the performance of the SVM to degrade. The true decision boundary is the surface of a sphere, hence a quadratic monomial (additive) function is sufficient. 16 SVM via Loss + Penalty 3.0 Binomial Log-likelihood Support Vector Loss 2.0 With f (x) = h(x)T β + β0 and yi ∈ {−1, 1}, consider min N X i=1 1.5 2.5 1.0 β0 , β [1 − yi f (xi )]+ + λ β 2 acements 0.0 Solution identical to SVM solution, with λ = λ(B ). -3 -2 -1 0 1 2 3 0.5 yf (x) (margin) In general min β0 , β N X i=1 L[yi , f (xi )]+λ β 2 17 Loss Functions For Y ∈ {−1, 1} Log-likelihood: L[Y, f (X )] = log 1 + e−Y f (X ) • (negative) binomial log-likelihood or deviance. • estimates the logit Pr(Y = 1|X ) f (X ) = log Pr(Y = −1|X ) SVM: L[Y, f (X )] = (1 − Y f (X ))+ . • Called “hinge loss” • Estimates the classifier (threshold) C (x) = sign Pr(Y = 1|X ) − 1 2 18 SVM and Function Estimation SVM with general kernel K minimizes: N i=1 (1 − yi f (xi ))+ + λ f 2 HK with f = b + h, h ∈ HK , b ∈ R. HK is the reproducing kernel Hilbert space (RKHS) of functions generated by the kernel K . The norm f HK is generally interpreted as a roughness penalty. More generally we can optimize N L(yi , f (xi )) + λ f i=1 2 HK 19 Quadratic Programming (path algorithm) N N N λ LP : ξi + β T β + αi (1 − yi f (xi ) − ξi ) − γ i ξi 2 i=1 i=1 i=1 ∂ : ∂β ∂ : ∂β0 1 β= λ N N α i y i xi i=1 yi αi = 0, i=1 along with the KKT conditions αi (1 − yi f (xi ) − ξi ) γ i ξi 1 − α i − γi =0 =0 =0 20 Implications of the KKT conditions Observations are in one of three states: • L = {i : yi f (xi ) < 1, αi = 1}, L for Left of the elbow • E = {i : yi f (xi ) = 1, 0 ≤ αi ≤ 1}, E for Elbow • R = {i : yi f (xi ) > 1, αi = 0}, R for Right of the elbow - Start with λ large, and the margin very wide. All αi = 1 (if N+ = N− ). As λ ↓ 0, the margin gets narrower. - For the narrowing margin to pass through a point, it’s α has to change from 1 to 0 (or from 0 to 1). While this is happening, the point has to linger on the margin. Hence the point moves from L to R via E . - The condition i yi αi = 0 demands a certain balance on opposite margins. 21 Example • λ = 0.5, and the width of the soft margin is 2/||β || = 2 × 0.587. f (x) = +1 1/||β || 12 1.5 9 11 7 8 f (x) = 0 10 5 3 f (x) = −1 −0.5 ements 2 6 • Two hollow points {3, 5} are misclassified, while the two solid points {10, 12} are correctly classified, but on the wrong side of their margin f (x) = +1; each of these has ξi > 0. • The three square shaped points {2, 6, 7} are exactly on the margin. 0.0 0.5 1.0 1 −1.0 4 −0.5 0.0 0.5 1.0 1.5 2.0 22 The Path • The αi are piecewise-linear in λ (or 1/C ) 12 points, 6 per class, Separated Mixture Data − Radial Kernel Gamma=1.0 10 Mixture Data − Radial Kernel Gamma=5 * 3 * 9 * 11 7 * * 12 * 6 * 5 ** 1 4 * 8 * 2 * ** ** * * ** * ** ** * * * * * * ** * * * * ** * ** * *** * * ** * * ** ** * * * * ** * * ** * * **** * *** * * ** * *** * * * * * **** * ** * * ** * ** ** * ** * * * * ** ** ** * * * ** * * ** *** ** * * * * * * ** ** * * * ** ** *** * * * * ** * * ** * * * * * ** * * ** * * * * ** ** * ** * ** * * * * ** * ** * * * * * * * * Step: 623 Error: 13 Elbow Size: 54 Loss: 30.46 ** ** * * ** * ** ** * * * * * * ** * * * * ** * ** * *** * * ** * * ** ** * * * * ** * * ** * * **** * *** * * ** * *** * * * * * **** * ** * * ** * ** ** * ** * * * * ** ** ** * * * ** * * ** *** ** * * * * * * ** ** * * * ** ** *** * * * * ** * * ** * * * * * ** * * ** * * * * ** ** * ** * ** * * * * ** * ** * * * * * * * * Step: 483 Error: 1 Elbow Size: 90 Loss: 1.01 Step: 17 Error: 0 Elbow Size: 2 Loss: 0 • The points in E characterize these paths, since points must stay on the margin (yi f (xi ) = 1) while their αi lie in (0, 1). • Points can revisit the margin more than once. • The coefficients β0 and β are piecewise-linear in C = 1/λ. (LARS, Efron et. al., 2002): quadratic criterion, L1 constraint. • The margins can stay wedged while their αi change, if they are “loaded to capacity”. • For non-separable data, the loss value, with a positive margin. i ξi achieves a minimum lacements Piecewise Linear α Paths 1.0 0.6 0.8 23 αi (λ) 0.0 0.2 0.4 0 2 4 6 8 10 λ 24 Step: 14 Error: 2 Elbow Size: 3 9 * Margin: 4.38 10 Path Statistics 1.5 o o o 8 ||beta|| Criterion Loss 1.0 11 7 * * 8 * 0.5 10 X2 5 3 ** * 12 * 0.0 2 * −0.5 2 4 * −0.5 0.0 0.5 X1 1.0 1.5 2.0 0 5e−03 6 * 1 * −1.0 4 6 5e−02 Lambda 5e−01 5e+00 25 The Need for Regularization Test Error Curves − SVM with Radial Kernel γ=5 0.35 γ=1 γ = 0.5 γ = 0.1 eplacements Test Error 0.20 1e−01 0.25 0.30 1e+01 1e+03 1e−01 1e+01 1e+03 1e−01 1e+01 1e+03 1e−01 1e+01 1e+03 C = 1/λ • γ is a kernel parameter: K (x, z ) = exp(−γ ||x − z ||2 ). • λ (or C ) are regularization parameters, which have to be determined using some means like cross-validation. 26 SVMs for regression • Linear regression model: f (x) = xT β + β0 , • To estimate β , we consider minimization of N (1) H (β, β0 ) = i=1 V (yi − f (xi )) + 0 λ β 2 2 , (2) where V (t) = This is called “ -insensitive” error measure. |t| − , otherwise. if |t| < , (3) 27 VH (r) − -4 -2 0 2 4 V (r) 2 1 0 0 2 4 6 8 10 12 3 4 lacements −c -4 -2 0 2 c 4 -1 r r The left panel shows the -insensitive error function used by the support vector regression machine. The right panel shows the error function used in Huber’s robust regression (green curve). Beyond |c|, the function changes from quadratic to linear. 28 Software • In R, e1071 package and library(svmpath) available from CRAN. • Many other packages fit SVMs ...
View Full Document

This note was uploaded on 11/08/2009 for the course STATS 315B taught by Professor Friedman during the Spring '08 term at Stanford.

Ask a homework question - tutors are online