Chp4 - Copy - 4 Linear Methods for Classification 4.1...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 4 Linear Methods for Classification 4.1 Introduction In this chapter we revisit the classification problem and focus on linear methods for classification. Since our predictor G ( x ) takes values in a dis- crete set G , we can always divide the input space into a collection of regions labeled according to the classification. We saw in Chapter 2 that the bound- aries of these regions can be rough or smooth, depending on the prediction function. For an important class of procedures, these decision boundaries are linear; this is what we will mean by linear methods for classification. There are several different ways in which linear decision boundaries can be found. In Chapter 2 we fit linear regression models to the class indicator variables, and classify to the largest fit. Suppose there are K classes, for convenience labeled 1 , 2 , . . . , K , and the fitted linear model for the k th indicator response variable is f k ( x ) = k + T k x . The decision boundary between class k and is that set of points for which f k ( x ) = f ( x ), that is, the set { x : ( k ) + ( k ) T x = 0 } , an ane set or hyperplane 1 Since the same is true for any pair of classes, the input space is divided into regions of constant classification, with piecewise hyperplanar decision boundaries. This regression approach is a member of a class of methods that model discriminant functions k ( x ) for each class, and then classify x to the class with the largest value for its discriminant function. Methods 1 Strictly speaking, a hyperplane passes through the origin, while an ane set need not. We sometimes ignore the distinction and refer in general to hyperplanes. Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 101 DOI: 10.1007/b94608_4, 102 4. Linear Methods for Classification that model the posterior probabilities Pr( G = k | X = x ) are also in this class. Clearly, if either the k ( x ) or Pr( G = k | X = x ) are linear in x , then the decision boundaries will be linear. Actually, all we require is that some monotone transformation of k or Pr( G = k | X = x ) be linear for the decision boundaries to be linear. For example, if there are two classes, a popular model for the posterior proba- bilities is Pr( G = 1 | X = x ) = exp( + T x ) 1 + exp( + T x ) , Pr( G = 2 | X = x ) = 1 1 + exp( + T x ) . (4.1) Here the monotone transformation is the logit transformation: log[ p/ (1 p )], and in fact we see that log Pr( G = 1 | X = x ) Pr( G = 2 | X = x ) = + T x. (4.2) The decision boundary is the set of points for which the log-odds are zero, and this is a hyperplane defined by x | + T x = 0 . We discuss two very popular but different methods that result in linear log-odds or logits: linear discriminant analysis and linear logistic regression. Although they differ in their derivation, the essential difference between them is in the way the...
View Full Document

This note was uploaded on 07/14/2010 for the course STAT 132 taught by Professor Haulk during the Spring '10 term at The University of British Columbia.

Page1 / 37

Chp4 - Copy - 4 Linear Methods for Classification 4.1...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online