This preview shows pages 1–11. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Multiclass Classification 9.520 Class 08, 06 March 2006 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each training point belongs to one of N different classes. The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs. What Isnt Multiclass Classification? There are many scenarios in which there are multiple cate gories to which points belong, but a given point can belong to multiple categories. In its most basic form, this problem decomposes trivially into a set of unlinked binary problems, which can be solved naturally using our techniques for bi nary classification. A First Idea Suppose we knew the density, p i ( x ), for each of the N classes. Then, we would predict using f ( x ) = arg max p i ( x ) . i 1 ,...,N Of course we dont know the densities, but we could esti mate them using classical techniques. The Problem With Densities, and Motivation Estimating densities is hard, especially in high dimensions with limited data. For binary classification tasks, we have seen that directly estimating a smooth separating function gives better re sults than density estimation (SVM, RLSC). Can we extend these approaches usefully to the multiclass scenario? A Simple Idea OnevsAll Classification Pick a good technique for building binary classifiers (e.g., RLSC, SVM). Build N different binary classifiers. For the i th classifier, let the positive examples be all the points in class i , and let the negative examples be all the points not in class i . Let f i be the i th classifier. Classify with f ( x ) = arg max f i ( x ) . i Another Simple Idea AllvsAll Classification Build N ( N 1) classifiers, one classifier to distinguish each pair of classes i and j . Let f ij be the classifier where class i were positive examples and class j were negative. Note f ji = f ij . Classify using f ( x ) = arg max f ij ( x ) . i j Also called allpairs or onevsone classification. The Truth OVA and AVA are so simple that many people invented them independently. Its hard to write papers about them. So theres a whole cottage industry in fancy, sophisticated methods for multiclass classification. To the best of my knowledge, choosing properly tuned regularization classifiers (RLSC, SVM) as your underlying binary classifiers and using onevsall (OVA) or allvsall (AVA) works as well as anything else you can do. If you actually have to solve a multiclass problem, I strongly urge you to simply use OVA or AVA, and not worry about anything else. The choice between OVA and AVA is largely computational. OVA vs. AVA Viewed naively, AVA seems faster and more memory e cient. It requires O ( N 2 ) classifiers instead of O ( N ), but each classifier is (on average) much smaller. If the time to build a classifier is superlinear in the number of data points, AVA is a better...
View
Full
Document
 Spring '04
 RuthRosenholtz

Click to edit the document details