Hearst_SVM

Hearst_SVM - TRENDS CONTROVERSIES Support vector machines...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
SVMs—a practical consequence of learning theory Bernhard Schölkopf, GMD First Is there anything worthwhile to learn about the new SVM algorithm, or does it fall into the category of “yet-another-algo- rithm,” in which case readers should stop here and save their time for something more useful? In this short overview, I will try to argue that studying support-vector learning is very useful in two respects. First, it is quite satisfying from a theoreti- cal point of view: SV learning is based on some beautifully simple ideas and provides a clear intuition of what learning from ex- amples is about. Second, it can lead to high performances in practical applications. In the following sense can the SV algo- rithm be considered as lying at the intersec- tion of learning theory and practice: for certain simple types of algorithms, statisti- cal learning theory can identify rather pre- cisely the factors that need to be taken into account to learn successfully. Real-world applications, however, often mandate the use of more complex models and algori- thms—such as neural networks—that are much harder to analyze theoretically. The SV algorithm achieves both. It constructs models that are complex enough: it con- tains a large class of neural nets, radial basis function (RBF) nets, and polynomial classifiers as special cases. Yet it is simple enough to be analyzed mathematically, because it can be shown to correspond to a linear method in a high-dimensional fea- ture space nonlinearly related to input space. Moreover, even though we can think of it as a linear algorithm in a high-dimen- sional space, in practice, it does not involve any computations in that high-dimensional space. By the use of kernels , all necessary computations are performed directly in input space. This is the characteristic twist of SV methods—we are dealing with com- plex algorithms for nonlinear pattern recognition, 1 regression, 2 or feature extrac- tion, 3 but for the sake of analysis and algo- rithmics, we can pretend that we are work- ing with a simple linear algorithm. I will explain the gist of SV methods by describing their roots in learning theory, the optimal hyperplane algorithm, the ker- nel trick, and SV function estimation. For details and further references, see Vladimir Vapnik’s authoritative treatment, 2 the col- lection my colleagues and I have put to- gether, 4 and the SV Web page at http://svm. first.gmd.de . Learning pattern recognition from examples For pattern recognition, we try to esti- mate a function f : R N {±1} using training data—that is, N -dimensional patterns x i and class labels y i , ( x 1 , y 1 ), ,( x , y ) R N × { ± 1}, (1) such that f will correctly classify new exam- ples ( x , y )—that is, f ( x ) = y for examples ( x , y ), which were generated from the same under- lying probability distribution P ( x , y ) as the training data. If we put no restriction on the class of functions that we choose our estimate f from, however, even a function that does well on the training data—for example by satisfying f ( x i ) = y i
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/18/2011 for the course CS 463 taught by Professor Tonymartinez during the Fall '11 term at BYU.

Page1 / 11

Hearst_SVM - TRENDS CONTROVERSIES Support vector machines...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online