cs229-notes3

cs229-notes3 - CS229 Lecture notes Andrew Ng Part V Support...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning al- gorithm. SVMs are among the best (and many believe is indeed the best) off-the-shelf supervised learning algorithm. To tell the SVM story, well need to first talk about margins and the idea of separating data with a large gap. Next, well talk about the optimal margin classifier, which will lead us into a digression on Lagrange duality. Well also see kernels, which give a way to apply SVMs efficiently in very high dimensional (such as infinite- dimensional) feature spaces, and finally, well close off the story with the SMO algorithm, which gives an efficient implementation of SVMs. 1 Margins: Intuition Well start our story on SVMs by talking about margins. This section will give the intuitions about margins and about the confidence of our predic- tions; these ideas will be made formal in Section 3. Consider logistic regression, where the probability p ( y = 1 | x ; ) is mod- eled by h ( x ) = g ( T x ). We would then predict 1 on an input x if and only if h ( x ) . 5, or equivalently, if and only if T x 0. Consider a positive training example ( y = 1). The larger T x is, the larger also is h ( x ) = p ( y = 1 | x ; w, b ), and thus also the higher our degree of confidence that the label is 1. Thus, informally we can think of our prediction as being a very confident one that y = 1 if T x 0. Similarly, we think of logistic regression as making a very confident prediction of y = 0, if T x 0. Given a training set, again informally it seems that wed have found a good fit to the training data if we can find so that T x ( i ) 0 whenever y ( i ) = 1, and 1 2 T x ( i ) 0 whenever y ( i ) = 0, since this would reflect a very confident (and correct) set of classifications for all the training examples. This seems to be a nice goal to aim for, and well soon formalize this idea using the notion of functional margins. For a different type of intuition, consider the following figure, in which xs represent positive training examples, os denote negative training examples, a decision boundary (this is the line given by the equation T x = 0, and is also called the separating hyperplane ) is also shown, and three points have also been labeled A, B and C. B A C Notice that the point A is very far from the decision boundary. If we are asked to make a prediction for the value of y at at A, it seems we should be quite confident that y = 1 there. Conversely, the point C is very close to the decision boundary, and while its on the side of the decision boundary on which we would predict y = 1, it seems likely that just a small change to the decision boundary could easily have caused out prediction to be y = 0....
View Full Document

This note was uploaded on 01/24/2010 for the course CS 229 at Stanford.

Page1 / 25

cs229-notes3 - CS229 Lecture notes Andrew Ng Part V Support...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online