lecture14-SVMs-handout-6-per

# lecture14-SVMs-handout-6-per -...

This preview shows pages 1–3. Sign up to view the full content.

1 Introduc)on to Informa(on Retrieval CS276: Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 14: Support vector machines and machine learning on documents [Borrows slides from Ray Mooney] Introduc)on to Informa)on Retrieval 2 Text classifica)on: Up un)l now and today Previously: 3 algorithms for text classifica)on Naive Bayes classifier K Nearest Neighbor classifica)on Simple, expensive at test )me, high variance, non‐linear Vector space classifica)on using centroids and hyperplanes that split them Simple, linear discriminant classifier; perhaps too simple (or maybe not*) Today SVMs Some empirical evalua)on and comparison Text‐specific issues in classifica)on Introduc)on to Informa)on Retrieval 3 Linear classifiers: Which Hyperplane? Lots of possible solu)ons for a, b, c. Some methods find a separa)ng hyperplane, but not the op)mal one [according to some criterion of expected goodness] E.g., perceptron Support Vector Machine (SVM) finds an op)mal solu)on. Maximizes the distance between the hyperplane and the “diﬃcult points” close to decision boundary One intui)on: if there are no points near the decision surface, then there are no very uncertain classifica)on decisions This line represents the decision boundary: a x + b y c = 0 Ch. 15 Introduc)on to Informa)on Retrieval 4 Another intui)on If you have to place a fat separator between classes, you have less choices, and so the capacity of the model has been decreased Sec. 15.1 Introduc)on to Informa)on Retrieval 5 Support Vector Machine (SVM) Support vectors Maximizes margin SVMs maximize the margin around the separa)ng hyperplane. A.k.a. large margin classifiers The decision func)on is fully specified by a subset of training samples, the support vectors . Solving SVMs is a quadra)c programming problem Currently widely seen as as the best text classifica)on method. Sec. 15.1 Narrower margin Introduc)on to Informa)on Retrieval 6 w : decision hyperplane normal vector x i : data point i y i : class of data point i (+1 or ‐1) Note: Not 1/0 Classifier is: f( x i ) = sign( w T x i + b) Func)onal margin of x i is: y i ( w T x i + b) But note that we can increase this margin simply by scaling w , b …. Func)onal margin of dataset is twice the minimum func)onal margin for any point The factor of 2 comes from measuring the whole width of the margin Maximum Margin: Formaliza)on Sec. 15.1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Introduc)on to Informa)on Retrieval 7 Geometric Margin Distance from example to the separator is Examples closest to the hyperplane are support vectors . Margin ρ of the separator is the width of separa)on between support vectors of classes. r ρ x x w Derivation of finding r: Dotted line x’ x is perpendicular to decision boundary so parallel to w .
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern