Kernel Design using Boosting
Koby Crammer Joseph Keshet Yoram Singer
School of Computer Science & Engineering
The Hebrew University, Jerusalem 91904, Israel
{
kobics,jkeshet,singer
}
@cs.huji.ac.il
Abstract
The focus of the paper is the problem of learning kernel operators from
empirical data. We cast the kernel design problem as the construction of
an accurate kernel from simple (and less accurate) base kernels. We use
the boosting paradigm to perform the kernel construction process. To do
so, we modify the booster so as to accommodate kernel operators. We
also devise an efficient weaklearner for simple kernels that is based on
generalized eigen vector decomposition. We demonstrate the effective
ness of our approach on synthetic data and on the USPS dataset. On the
USPS dataset, the performance of the Perceptron algorithm with learned
kernels is systematically better than a fixed RBF kernel.
1 Introduction and problem Setting
The last decade brought voluminous amount of work on the design, analysis and experi
mentation of kernel machines. Algorithm based on kernels can be used for various ma
chine learning tasks such as classification, regression, ranking, and principle component
analysis. The most prominent learning algorithm that employs kernels is the Support Vec
tor Machines (SVM) [1, 2] designed for classification and regression. A key component
in a kernel machine is a
kernel operator
which computes for any pair of instances their
innerproduct in some abstract vector space. Intuitively and informally, a kernel operator
is a means for measuring similarity between instances. Almost all of the work that em
ployed kernel operators concentrated on various machine learning problems that involved
a
predefined
kernel. A typical approach when using kernels is to choose a kernel before
learning starts. Examples to popular predefined kernels are the Radial Basis Functions and
the polynomial kernels (see for instance [1]). Despite the simplicity required in modifying
a learning algorithm to a “kernelized” version, the success of such algorithms is not well
understood yet. More recently, special efforts have been devoted to crafting kernels for
specific tasks such as text categorization [3] and protein classification problems [4].
Our work attempts to give a computational alternative to predefined kernels by learning
kernel operators from data. We start with a few definitions. Let
X
be an instance space.
A kernel is an innerproduct operator
K
:
X×X→
. An explicit way to describe
K
is via a mapping
φ
:
X→H
from
X
to an innerproducts space
H
such that
K
(
x, x
0
)=
φ
(
x
)
·
φ
(
x
0
)
. Given a kernel operator and a finite set of instances
S
=
{
x
i
,y
i
}
m
i
=1
, the kernel
matrix (a.k.a the Gram matrix) is the matrix of all possible innerproducts of pairs from
S
,
K
i,j
=
K
(
x
i
,x
j
)
. We therefore refer to the general form of
K
as the kernel
operator
and
to the application of the kernel operator to a set of pairs of instances as the kernel
matrix
.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThe specific setting of kernel design we consider assumes that we have access to a
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '09
 JilinWang
 Computer Science, Machine Learning, Michael Collins, kernel, Nello Cristianini, kernel operator

Click to edit the document details