5
Basis Expansions and Regularization
5.1
Introduction
We have already made use of models linear in the input features, both for
regression and classiﬁcation. Linear regression, linear discriminant analysis,
logistic regression and separating hyperplanes all rely on a linear model.
It is extremely unlikely that the true function
f
(
X
) is actually linear in
X
. In regression problems,
f
(
X
)=E(
Y

X
) will typically be nonlinear and
nonadditive in
X
, and representing
f
(
X
) by a linear model is usually a con
venient, and sometimes a necessary, approximation. Convenient because a
linear model is easy to interpret, and is the ﬁrstorder Taylor approxima
tion to
f
(
X
). Sometimes necessary, because with
N
small and/or
p
large,
a linear model might be all we are able to ﬁt to the data without overﬁt
ting. Likewise in classiﬁcation, a linear, Bayesoptimal decision boundary
implies that some monotone transformation of Pr(
Y
=1

X
) is linear in
X
.
This is inevitably an approximation.
In this chapter and the next we discuss popular methods for moving
beyond linearity. The core idea in this chapter is to augment/replace the
vector of inputs
X
with additional variables, which are transformations of
X
, and then use linear models in this new space of derived input features.
Denote by
h
m
(
X
):I
R
p
±→
IR t h e
m
th transformation of
X
,
m
=
1
,...,M
. We then model
f
(
X
)=
M
±
m
=1
β
m
h
m
(
X
)
,
(5.1)
© Springer Science+Business Media, LLC 2009
T. Hastie et al.,
The Elements of Statistical Learning, Second Edition,
139
DOI: 10.1007/b94608_5,