5
Basis Expansions and Regularization
5.1
Introduction
We have already made use of models linear in the input features, both for
regression and classification. Linear regression, linear discriminant analysis,
logistic regression and separating hyperplanes all rely on a linear model.
It is extremely unlikely that the true function
f
(
X
) is actually linear in
X
. In regression problems,
f
(
X
) = E(
Y

X
) will typically be nonlinear and
nonadditive in
X
, and representing
f
(
X
) by a linear model is usually a con
venient, and sometimes a necessary, approximation. Convenient because a
linear model is easy to interpret, and is the firstorder Taylor approxima
tion to
f
(
X
). Sometimes necessary, because with
N
small and/or
p
large,
a linear model might be all we are able to fit to the data without overfit
ting. Likewise in classification, a linear, Bayesoptimal decision boundary
implies that some monotone transformation of Pr(
Y
= 1

X
) is linear in
X
.
This is inevitably an approximation.
In this chapter and the next we discuss popular methods for moving
beyond linearity. The core idea in this chapter is to augment/replace the
vector of inputs
X
with additional variables, which are transformations of
X
, and then use linear models in this new space of derived input features.
Denote by
h
m
(
X
) : IR
p
→
IR the
m
th transformation of
X
,
m
=
1
, . . . , M
. We then model
f
(
X
) =
M
m
=1
β
m
h
m
(
X
)
,
(5.1)
© Springer Science+Business Media, LLC 2009
T. Hastie et al.,
The Elements of Statistical Learning, Second Edition,
139
DOI: 10.1007/b94608_5,
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
140
5.
Basis Expansions and Regularization
a
linear basis expansion
in
X
. The beauty of this approach is that once the
basis functions
h
m
have been determined, the models are linear in these
new variables, and the fitting proceeds as before.
Some simple and widely used examples of the
h
m
are the following:
•
h
m
(
X
) =
X
m
, m
= 1
, . . . , p
recovers the original linear model.
•
h
m
(
X
) =
X
2
j
or
h
m
(
X
) =
X
j
X
k
allows us to augment the inputs with
polynomial terms to achieve higherorder Taylor expansions. Note,
however, that the number of variables grows exponentially in the de
gree of the polynomial. A full quadratic model in
p
variables requires
O
(
p
2
) square and crossproduct terms, or more generally
O
(
p
d
) for a
degree
d
polynomial.
•
h
m
(
X
) = log(
X
j
)
,
X
j
, . . .
permits other nonlinear transformations
of single inputs. More generally one can use similar functions involv
ing several inputs, such as
h
m
(
X
) =

X

.
•
h
m
(
X
) =
I
(
L
m
≤
X
k
< U
m
), an indicator for a region of
X
k
. By
breaking the range of
X
k
up into
M
k
such nonoverlapping regions
results in a model with a piecewise constant contribution for
X
k
.
Sometimes the problem at hand will call for particular basis functions
h
m
,
such as logarithms or power functions. More often, however, we use the basis
expansions as a device to achieve more ﬂexible representations for
f
(
X
).
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 Haulk
 Linear Regression, Regression Analysis, ... ..., Sλ

Click to edit the document details