cs229-notes3

cs229-notes3 - CS229 Lecture notes Andrew Ng Part V Support...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning al- gorithm. SVMs are among the best (and many believe is indeed the best) “oF-the-shelf” supervised learning algorithm. To tell the SVM story, we’ll need to ±rst talk about margins and the idea of separating data with a large “gap.” Next, we’ll talk about the optimal margin classi±er, which will lead us into a digression on Lagrange duality. We’ll also see kernels, which give a way to apply SVMs e²ciently in very high dimensional (such as in±nite- dimensional) feature spaces, and ±nally, we’ll close oF the story with the SMO algorithm, which gives an e²cient implementation of SVMs. 1 Margins: Intuition We’ll start our story on SVMs by talking about margins. This sectionw i l l give the intuitions about margins and about the “con±dence” of our predic- tions; these ideas will be made formal in Section 3. Consider logistic regression, where the probability p ( y =1 | x ; θ )ismod - eled by h θ ( x )= g ( θ T x ). We would then predict “1” on an input x if and only if h θ ( x ) 0 . 5, or equivalently, if and only if θ T x 0. Consider a positive training example ( y = 1). The larger θ T x is, the larger also is h θ ( x p ( y | x ; w,b ), and thus also the higher our degree of “con±dence” that the label is 1. Thus, informally we can think of our prediction as being a very con±dent one that y =1i f θ T x ± 0. Similarly, we think of logistic regression as making a very con±dent prediction of y =0 ,if θ T x ² 0. Given a training set, again informally it seems that we’d have found a good ±t to the training data if we can ±nd θ so that θ T x ( i ) ± 0 whenever y ( i ) ,and 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 θ T x ( i ) ± 0 whenever y ( i ) = 0, since this would refect a very conFdent (and correct) set o± classiFcations ±or all the training examples. This seems to be a nice goal to aim ±or, and we’ll soon ±ormalize this idea using the notion o± ±unctional margins. ²or a di³erent type o± intuition, consider the ±ollowing Fgure, in which x’s represent positive training examples, o’s denote negative training examples, a decision boundary (this is the line given by the equation θ T x =0 ,and is also called the separating hyperplane ) is also shown, and three points have also been labeled A, B and C. B A C Notice that the point A is very ±ar ±rom the decision boundary. I ±weare asked to make a prediction ±or the value o± y at at A, it seems we should be quite conFdent that y = 1 there. Conversely, the point C is very close to the decision boundary, and while it’s on the side o± the decision boundary on which we would predict y = 1, it seems likely that just a small change to the decision boundary could easily have caused out prediction tobe y .
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 01/25/2012.

Page1 / 25

cs229-notes3 - CS229 Lecture notes Andrew Ng Part V Support...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online