13
Prototype Methods and
NearestNeighbors
13.1
Introduction
In this chapter we discuss some simple and essentially modelfree methods
for classiﬁcation and pattern recognition. Because they are highly unstruc
tured, they typically are not useful for understanding the nature of the
relationship between the features and class outcome. However, as
black box
prediction engines, they can be very eﬀective, and are often among the best
performers in real data problems. The nearestneighbor technique can also
be used in regression; this was touched on in Chapter 2 and works reason
ably well for lowdimensional problems. However, with highdimensional
features, the bias–variance tradeoﬀ does not work as favorably for nearest
neighbor regression as it does for classiﬁcation.
13.2
Prototype Methods
Throughout this chapter, our training data consists of the
N
pairs (
x
1
,g
1
)
,
...,
(
x
n
N
) where
g
i
is a class label taking values in
{
1
,
2
,...,K
}
.Pro

totype methods represent the training data by a set of points in feature
space. These prototypes are typically not examples from the training sam
ple, except in the case of 1nearestneighbor classiﬁcation discussed later.
Each prototype has an associated class label, and classiﬁcation of a query
point
x
is made to the class of the closest prototype. “Closest” is usually
deﬁned by Euclidean distance in the feature space, after each feature has
© Springer Science+Business Media, LLC 2009
T. Hastie et al.,
The Elements of Statistical Learning, Second Edition,
459
DOI: 10.1007/b94608_13,
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document460
13. Prototypes and NearestNeighbors
been standardized to have overall mean 0 and variance 1 in the training
sample. Euclidean distance is appropriate for quantitative features. We
discuss distance measures between qualitative and other kinds of feature
values in Chapter 14.
These methods can be very eﬀective if the prototypes are well positioned
to capture the distribution of each class. Irregular class boundaries can be
represented, with enough prototypes in the right places in feature space.
The main challenge is to ﬁgure out how many prototypes to use and where
to put them. Methods diﬀer according to the number and way in which
prototypes are selected.
13.2.1
K
means Clustering
K
means clustering is a method for ﬁnding clusters and cluster centers in a
set of unlabeled data. One chooses the desired number of cluster centers, say
R
, and the
K
means procedure iteratively moves the centers to minimize
the total within cluster variance.
1
Given an initial set of centers, the
K

means algorithm alternates the two steps:
•
for each center we identify the subset of training points (its cluster)
that is closer to it than any other center;
•
the means of each feature for the data points in each cluster are
computed, and this mean vector becomes the new center for that
cluster.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 Haulk
 ... ..., Mixture model

Click to edit the document details