lecture14-SVMs-handout-6-per

lecture14-SVMs-handout-6-per -...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Introduc)on to Informa(on Retrieval CS276: Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 14: Support vector machines and machine learning on documents [Borrows slides from Ray Mooney] Introduc)on to Informa)on Retrieval 2 Text classiFca)on: Up un)l now and today Previously: 3 algorithms for text classiFca)on Naive Bayes classiFer K Nearest Neighbor classiFca)on Simple, expensive at test )me, high variance, non‐linear Vector space classiFca)on using centroids and hyperplanes that split them Simple, linear discriminant classiFer; perhaps too simple (or maybe not*± Today SVMs Some empirical evalua)on and comparison Text‐speciFc issues in classiFca)on Introduc)on to Informa)on Retrieval 3 Linear classiFers: Which Hyperplane? Lots of possible solu)ons for a, b, c. Some methods Fnd a separa)ng hyperplane, but not the op)mal one [according to some criterion of expected goodness] E.g., perceptron Support Vector Machine (SVM± Fnds an op)mal solu)on. Maximizes the distance between the hyperplane and the “difficult points” close to decision boundary One intui)on: if there are no points near the decision surface, then there are no very uncertain classiFca)on decisions This line represents the decision boundary: a x + b y ʵ c = 0 Ch. 15 Introduc)on to Informa)on Retrieval 4 Another intui)on If you have to place a fat separator between classes, you have less choices, and so the capacity of the model has been decreased Sec. 15.1 Introduc)on to Informa)on Retrieval 5 Support Vector Machine (SVM± Support vectors Maximizes margin SVMs maximize the margin around the separa)ng hyperplane. A.k.a. large margin classiFers The decision func)on is fully speciFed by a subset of training samples, the support vectors . Solving SVMs is a quadra)c programming problem Currently widely seen as as the best text classiFca)on method. Sec. 15.1 Narrower margin Introduc)on to Informa)on Retrieval 6 w : decision hyperplane normal vector x i : data point i y i : class of data point i (+1 or ‐1± Note: Not 1/0 ClassiFer is: f( x i ± = sign( w T x i + b± ²unc)onal margin of x i is: y i ( w T x i + b± But note that we can increase this margin simply by scaling w , b …. ²unc)onal margin of dataset is twice the minimum func)onal margin for any point The factor of 2 comes from measuring the whole width of the margin Maximum Margin: ²ormaliza)on Sec. 15.1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 Introduc)on to Informa)on Retrieval 7 Geometric Margin Distance from example to the separator is Examples closest to the hyperplane are support vectors . Margin ρ of the separator is the width of separa)on between support vectors of classes.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 8

lecture14-SVMs-handout-6-per -...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online