*This preview shows
pages
1–3. Sign up
to
view the full content.*

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*
**Unformatted text preview: **CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning al- gorithm. SVMs are among the best (and many believe is indeed the best) off-the-shelf supervised learning algorithm. To tell the SVM story, well need to first talk about margins and the idea of separating data with a large gap. Next, well talk about the optimal margin classifier, which will lead us into a digression on Lagrange duality. Well also see kernels, which give a way to apply SVMs efficiently in very high dimensional (such as infinite- dimensional) feature spaces, and finally, well close off the story with the SMO algorithm, which gives an efficient implementation of SVMs. 1 Margins: Intuition Well start our story on SVMs by talking about margins. This section will give the intuitions about margins and about the confidence of our predic- tions; these ideas will be made formal in Section 3. Consider logistic regression, where the probability p ( y = 1 | x ; ) is mod- eled by h ( x ) = g ( T x ). We would then predict 1 on an input x if and only if h ( x ) . 5, or equivalently, if and only if T x 0. Consider a positive training example ( y = 1). The larger T x is, the larger also is h ( x ) = p ( y = 1 | x ; w,b ), and thus also the higher our degree of confidence that the label is 1. Thus, informally we can think of our prediction as being a very confident one that y = 1 if T x 0. Similarly, we think of logistic regression as making a very confident prediction of y = 0, if T x 0. Given a training set, again informally it seems that wed have found a good fit to the training data if we can find so that T x ( i ) 0 whenever y ( i ) = 1, and 1 2 T x ( i ) 0 whenever y ( i ) = 0, since this would reflect a very confident (and correct) set of classifications for all the training examples. This seems to be a nice goal to aim for, and well soon formalize this idea using the notion of functional margins. For a different type of intuition, consider the following figure, in which xs represent positive training examples, os denote negative training examples, a decision boundary (this is the line given by the equation T x = 0, and is also called the separating hyperplane ) is also shown, and three points have also been labeled A, B and C. B A C Notice that the point A is very far from the decision boundary. If we are asked to make a prediction for the value of y at at A, it seems we should be quite confident that y = 1 there. Conversely, the point C is very close to the decision boundary, and while its on the side of the decision boundary on which we would predict y = 1, it seems likely that just a small change to the decision boundary could easily have caused out prediction to be y = 0....

View
Full
Document