{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

cs229-notes3

# cs229-notes3 - CS229 Lecture notes Andrew Ng Part V Support...

This preview shows pages 1–3. Sign up to view the full content.

CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning al- gorithm. SVMs are among the best (and many believe is indeed the best) “off-the-shelf” supervised learning algorithm. To tell the SVM story, we’ll need to first talk about margins and the idea of separating data with a large “gap.” Next, we’ll talk about the optimal margin classifier, which will lead us into a digression on Lagrange duality. We’ll also see kernels, which give a way to apply SVMs efficiently in very high dimensional (such as infinite- dimensional) feature spaces, and finally, we’ll close off the story with the SMO algorithm, which gives an efficient implementation of SVMs. 1 Margins: Intuition We’ll start our story on SVMs by talking about margins. This section will give the intuitions about margins and about the “confidence” of our predic- tions; these ideas will be made formal in Section 3. Consider logistic regression, where the probability p ( y = 1 | x ; θ ) is mod- eled by h θ ( x ) = g ( θ T x ). We would then predict “1” on an input x if and only if h θ ( x ) 0 . 5, or equivalently, if and only if θ T x 0. Consider a positive training example ( y = 1). The larger θ T x is, the larger also is h θ ( x ) = p ( y = 1 | x ; w, b ), and thus also the higher our degree of “confidence” that the label is 1. Thus, informally we can think of our prediction as being a very confident one that y = 1 if θ T x 0. Similarly, we think of logistic regression as making a very confident prediction of y = 0, if θ T x 0. Given a training set, again informally it seems that we’d have found a good fit to the training data if we can find θ so that θ T x ( i ) 0 whenever y ( i ) = 1, and 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 θ T x ( i ) 0 whenever y ( i ) = 0, since this would reflect a very confident (and correct) set of classifications for all the training examples. This seems to be a nice goal to aim for, and we’ll soon formalize this idea using the notion of functional margins. For a different type of intuition, consider the following figure, in which x’s represent positive training examples, o’s denote negative training examples, a decision boundary (this is the line given by the equation θ T x = 0, and is also called the separating hyperplane ) is also shown, and three points have also been labeled A, B and C. B A C Notice that the point A is very far from the decision boundary. If we are asked to make a prediction for the value of y at at A, it seems we should be quite confident that y = 1 there. Conversely, the point C is very close to the decision boundary, and while it’s on the side of the decision boundary on which we would predict y = 1, it seems likely that just a small change to the decision boundary could easily have caused out prediction to be y = 0.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 25

cs229-notes3 - CS229 Lecture notes Andrew Ng Part V Support...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online