6
Kernel Smoothing Methods
In this chapter we describe a class of regression techniques that achieve
ﬂexibility in estimating the regression function
f
(
X
) over the domain IR
p
by ﬁtting a diﬀerent but simple model separately at each query point
x
0
.
This is done by using only those observations close to the target point
x
0
to
ﬁt the simple model, and in such a way that the resulting estimated function
ˆ
f
(
X
)is
smooth
in IR
p
. This localization is achieved via a weighting function
or
kernel
K
λ
(
x
0
,x
i
), which assigns a weight to
x
i
based on its distance from
x
0
. The kernels
K
λ
are typically indexed by a parameter
λ
that dictates
the width of the neighborhood. These
memorybased
methods require in
principle little or no training; all the work gets done at evaluation time.
The only parameter that needs to be determined from the training data is
λ
. The model, however, is the entire training data set.
We also discuss more general classes of kernelbased techniques , which
tie in with structured methods in other chapters, and are useful for density
estimation and classiﬁcation.
The techniques in this chapter should not be confused with those asso
ciated with the more recent usage of the phrase “kernel methods”. In this
chapter kernels are mostly used as a device for localization. We discuss ker
nel methods in Sections 5.8, 14.5.4, 18.5 and Chapter 12; in those contexts
the kernel computes an inner product in a highdimensional (implicit) fea
ture space, and is used for regularized nonlinear modeling. We make some
connections to the methodology in this chapter at the end of Section 6.7.
© Springer Science+Business Media, LLC 2009
T. Hastie et al.,
The Elements of Statistical Learning, Second Edition,
191
DOI: 10.1007/b94608_6,
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document192
6. Kernel Smoothing Methods
NearestNeighbor Kernel
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.5
0.0
0.5
1.0
1.5
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
•
x
0
ˆ
f
(
x
0
)
Epanechnikov Kernel
0.0
0.2
0.4
0.6
0.8
1.0
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
•
x
0
ˆ
f
(
x
0
)
FIGURE 6.1.
In each panel
100
pairs
x
i
,y
i
are generated at random from the
blue curve with Gaussian errors:
Y
=s
in(4
X
)+
ε
,
X
∼
U
[0
,
1]
,
ε
∼
N
(0
,
1
/
3)
.In
the left panel the green curve is the result of a
30
nearestneighbor runningmean
smoother. The red point is the Ftted constant
ˆ
f
(
x
0
)
, and the red circles indicate
those observations contributing to the Ft at
x
0
. The solid yellow region indicates
the weights assigned to observations. In the right panel, the green curve is the
kernelweighted average, using an Epanechnikov kernel with (half) window width
λ
=0
.
2
.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 Haulk
 Regression Analysis, oo o oo, oo oo oo, Kλ, kernel smoothing methods

Click to edit the document details