Statistics and Computing
14
: 199–222, 2004
C
°
2004 Kluwer Academic Publishers.
Manufactured in The Netherlands
.
A tutorial on support vector regression
∗
ALEX J. SMOLA and BERNHARD SCH
¨
OLKOPF
RSISE, Australian National University, Canberra 0200, Australia
Alex.Smola@anu.edu.au
Max-Planck-Institut f¨ur biologische Kybernetik, 72076 T¨ubingen, Germany
Bernhard.Schoelkopf@tuebingen.mpg.de
Received July 2002 and accepted November 2003
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for
function estimation. Furthermore, we include a summary of currently used algorithms for training
SV machines, covering both the quadratic (or convex) programming part and advanced methods for
dealing with large datasets. Finally, we mention some modi±cations and extensions that have been
applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Keywords:
machine learning, support vector machines, regression estimation
1. Introduction
The purpose of this paper is twofold. It should serve as a self-
contained introduction to Support Vector regression for readers
new to this rapidly developing ±eld of research.
1
On the other
hand, it attempts to give an overview of recent developments in
the ±eld.
To this end, we decided to organize the essay as follows.
We start by giving a brief overview of the basic techniques in
Sections 1, 2 and 3, plus a short summary with a number of
±gures and diagrams in Section 4. Section 5 reviews current
algorithmic techniques used for actually implementing SV
machines. This may be of most interest for practitioners.
The following section covers more advanced topics such as
extensions of the basic SV algorithm, connections between SV
machines and regularization and brie²y mentions methods for
carrying out model selection. We conclude with a discussion
of open questions and problems and current directions of SV
research. Most of the results presented in this review paper
already have been published elsewhere, but the comprehensive
presentations and some details are new.
1.1.
Historic background
The SV algorithm is a nonlinear generalization of the
Gener-
alized Portrait
algorithm developed in Russia in the sixties
2
∗
An extended version of this paper is available as NeuroCOLT Technical Report
TR-98-030.
(Vapnik and Lerner 1963, Vapnik and Chervonenkis 1964). As
such, it is ±rmly grounded in the framework of statistical learn-
ing theory, or
VC theory
,which has been developed over the last
three decades by Vapnik and Chervonenkis (1974) and Vapnik
(1982, 1995). In a nutshell, VC theory characterizes properties
of learning machines which enable them to generalize well to
unseen data.
In its present form, the SV machine was largely developed
at AT&T Bell Laboratories by Vapnik and co-workers (Boser,
Guyon and Vapnik 1992, Guyon, Boser and Vapnik 1993, Cortes
and Vapnik, 1995, Sch¨olkopf, Burges and Vapnik 1995, 1996,
Vapnik, Golowich and Smola 1997). Due to this industrial con-
text, SV research has up to date had a sound orientation towards
real-world applications. Initial work focused on OCR (optical