262
8. Model Inference and Averaging
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1012345
x
y
•
•
•
• •
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.2
0.4
0.6
0.8
1.0
x
Bspline Basis
FIGURE 8.1.
(Left panel): Data for smoothing example. (Right panel:) Set of
seven
B
spline basis functions. The broken vertical lines indicate the placement
of the three knots.
Denote the training data by
Z
=
{
z
1
,z
2
,...,z
N
}
, with
z
i
=(
x
i
,y
i
),
i
=1
,
2
,...,N
. Here
x
i
is a onedimensional input, and
y
i
the outcome,
either continuous or categorical. As an example, consider the
N
= 50 data
points shown in the left panel of Figure 8.1.
Suppose we decide to ±t a cubic spline to the data, with three knots
placed at the quartiles of the
X
values. This is a sevendimensional lin
ear space of functions, and can be represented, for example, by a linear
expansion of
B
spline basis functions (see Section 5.9.2):
μ
(
x
)=
7
±
j
=1
β
j
h
j
(
x
)
.
(8.1)
Here the
h
j
(
x
),
j
,
2
,...,
7 are the seven functions shown in the right
panel of Figure 8.1. We can think of
μ
(
x
) as representing the conditional
mean E(
Y

X
=
x
).
Let
H
be the
N
×
7 matrix with
ij
th element
h
j
(
x
i
). The usual estimate
of
β
, obtained by minimizing the squared error over the training set, is
given by
ˆ
β
H
T
H
)
−
1
H
T
y
.
(8.2)
The corresponding ±t ˆ
μ
(
x
∑
7
j
=1
ˆ
β
j
h
j
(
x
) is shown in the top left panel
of Figure 8.2.
The estimated covariance matrix of
ˆ
β
is
d
Var(
ˆ
β
)=(
H
T
H
)
−
1
ˆ
σ
2
,
(8.3)
where we have estimated the noise variance by ˆ
σ
2
=
∑
N
i
=1
(
y
i
−
ˆ
μ
(
x
i
))
2
/N
.