{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}


lecture14-SVMs-handout-6-per -...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Introduc)on to Informa(on Retrieval CS276: Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 14: Support vector machines and machine learning on documents [Borrows slides from Ray Mooney] Introduc)on to Informa)on Retrieval 2 Text classifica)on: Up un)l now and today Previously: 3 algorithms for text classifica)on Naive Bayes classifier K Nearest Neighbor classifica)on Simple, expensive at test )me, high variance, non‐linear Vector space classifica)on using centroids and hyperplanes that split them Simple, linear discriminant classifier; perhaps too simple (or maybe not*) Today SVMs Some empirical evalua)on and comparison Text‐specific issues in classifica)on Introduc)on to Informa)on Retrieval 3 Linear classifiers: Which Hyperplane? Lots of possible solu)ons for a, b, c. Some methods find a separa)ng hyperplane, but not the op)mal one [according to some criterion of expected goodness] E.g., perceptron Support Vector Machine (SVM) finds an op)mal solu)on. Maximizes the distance between the hyperplane and the “difficult points” close to decision boundary One intui)on: if there are no points near the decision surface, then there are no very uncertain classifica)on decisions This line represents the decision boundary: a x + b y c = 0 Ch. 15 Introduc)on to Informa)on Retrieval 4 Another intui)on If you have to place a fat separator between classes, you have less choices, and so the capacity of the model has been decreased Sec. 15.1 Introduc)on to Informa)on Retrieval 5 Support Vector Machine (SVM) Support vectors Maximizes margin SVMs maximize the margin around the separa)ng hyperplane. A.k.a. large margin classifiers The decision func)on is fully specified by a subset of training samples, the support vectors . Solving SVMs is a quadra)c programming problem Currently widely seen as as the best text classifica)on method. Sec. 15.1 Narrower margin Introduc)on to Informa)on Retrieval 6 w : decision hyperplane normal vector x i : data point i y i : class of data point i (+1 or ‐1) Note: Not 1/0 Classifier is: f( x i ) = sign( w T x i + b) Func)onal margin of x i is: y i ( w T x i + b) But note that we can increase this margin simply by scaling w , b …. Func)onal margin of dataset is twice the minimum func)onal margin for any point The factor of 2 comes from measuring the whole width of the margin Maximum Margin: Formaliza)on Sec. 15.1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 Introduc)on to Informa)on Retrieval 7 Geometric Margin Distance from example to the separator is Examples closest to the hyperplane are support vectors . Margin ρ of the separator is the width of separa)on between support vectors of classes. r ρ x x w Derivation of finding r: Dotted line x’ x is perpendicular to decision boundary so parallel to w .
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}