Chp13 - Copy

Chp13 - Copy - 13 Prototype Methods and Nearest-Neighbors...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
13 Prototype Methods and Nearest-Neighbors 13.1 Introduction In this chapter we discuss some simple and essentially model-free methods for classification and pattern recognition. Because they are highly unstruc- tured, they typically are not useful for understanding the nature of the relationship between the features and class outcome. However, as black box prediction engines, they can be very effective, and are often among the best performers in real data problems. The nearest-neighbor technique can also be used in regression; this was touched on in Chapter 2 and works reason- ably well for low-dimensional problems. However, with high-dimensional features, the bias–variance tradeoff does not work as favorably for nearest- neighbor regression as it does for classification. 13.2 Prototype Methods Throughout this chapter, our training data consists of the N pairs ( x 1 ,g 1 ) , ..., ( x n N ) where g i is a class label taking values in { 1 , 2 ,...,K } .Pro - totype methods represent the training data by a set of points in feature space. These prototypes are typically not examples from the training sam- ple, except in the case of 1-nearest-neighbor classification discussed later. Each prototype has an associated class label, and classification of a query point x is made to the class of the closest prototype. “Closest” is usually defined by Euclidean distance in the feature space, after each feature has © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 459 DOI: 10.1007/b94608_13,
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
460 13. Prototypes and Nearest-Neighbors been standardized to have overall mean 0 and variance 1 in the training sample. Euclidean distance is appropriate for quantitative features. We discuss distance measures between qualitative and other kinds of feature values in Chapter 14. These methods can be very effective if the prototypes are well positioned to capture the distribution of each class. Irregular class boundaries can be represented, with enough prototypes in the right places in feature space. The main challenge is to figure out how many prototypes to use and where to put them. Methods differ according to the number and way in which prototypes are selected. 13.2.1 K -means Clustering K -means clustering is a method for finding clusters and cluster centers in a set of unlabeled data. One chooses the desired number of cluster centers, say R , and the K -means procedure iteratively moves the centers to minimize the total within cluster variance. 1 Given an initial set of centers, the K - means algorithm alternates the two steps: for each center we identify the subset of training points (its cluster) that is closer to it than any other center; the means of each feature for the data points in each cluster are computed, and this mean vector becomes the new center for that cluster.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 25

Chp13 - Copy - 13 Prototype Methods and Nearest-Neighbors...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online