{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

class08

class08 - Multiclass Classication 9.520 March 2006 Ryan...

This preview shows pages 1–11. Sign up to view the full content.

Multiclass Classification 9.520 Class 08, 06 March 2006 Ryan Rifkin

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
“It is a tale Told by an idiot, full of sound and fury, Signifying nothing.” Macbeth, Act V, Scene V
What Is Multiclass Classification? Each training point belongs to one of N different classes. The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
What Isn’t Multiclass Classification? There are many scenarios in which there are multiple cate- gories to which points belong, but a given point can belong to multiple categories. In its most basic form, this problem decomposes trivially into a set of unlinked binary problems, which can be solved naturally using our techniques for bi- nary classification.
A First Idea Suppose we knew the density, p i ( x ), for each of the N classes. Then, we would predict using f ( x ) = arg max p i ( x ) . i 1 ,...,N Of course we don’t know the densities, but we could esti- mate them using classical techniques.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
The Problem With Densities, and Motivation Estimating densities is hard, especially in high dimensions with limited data. For binary classification tasks, we have seen that directly estimating a smooth separating function gives better re- sults than density estimation (SVM, RLSC). Can we extend these approaches usefully to the multiclass scenario?
A Simple Idea One-vs-All Classification Pick a good technique for building binary classifiers (e.g., RLSC, SVM). Build N different binary classifiers. For the i th classifier, let the positive examples be all the points in class i , and let the negative examples be all the points not in class i . Let f i be the i th classifier. Classify with f ( x ) = arg max f i ( x ) . i

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Another Simple Idea All-vs-All Classification Build N ( N 1) classifiers, one classifier to distinguish each pair of classes i and j . Let f ij be the classifier where class i were positive examples and class j were negative. Note f ji = f ij . Classify using f ( x ) = arg max f ij ( x ) . i j Also called all-pairs or one-vs-one classification.
The Truth OVA and AVA are so simple that many people invented them independently. It’s hard to write papers about them. So there’s a whole cottage industry in fancy, sophisticated methods for multiclass classification. To the best of my knowledge, choosing properly tuned regularization classifiers (RLSC, SVM) as your underlying binary classifiers and using one-vs-all (OVA) or all-vs-all (AVA) works as well as anything else you can do. If you actually have to solve a multiclass problem, I strongly urge you to simply use OVA or AVA, and not worry about anything else. The choice between OVA and AVA is largely computational.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
OVA vs. AVA Viewed naively, AVA seems faster and more memory eﬃ- cient. It requires O ( N 2 ) classifiers instead of O ( N ), but each classifier is (on average) much smaller. If the time to build a classifier is superlinear in the number of data points, AVA is a better choice. With SVMs, AVA’s probably best.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 59

class08 - Multiclass Classication 9.520 March 2006 Ryan...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online