{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

chap4 - ESL Chapter 4 Linear Methods for Classification...

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Linear Methods for Classification Linear regression linear and quadatric discriminant functions example: gene expression arrays reduced rank LDA logistic regression separating hyperplanes 1 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Linear classifiers Some concepts: T linear regression fk (x) = k0 + k x decision boundary between classes k and : {x : fk (x) = f (x)} linear discriminant analysis, logistic regression P (G = 1|X = x) = 0 + T x log P (G = 2|X = x) explicit approaches: separating hyperplanes discriminant functions: k (x), G(x) k = 1, 2, . . . K = argmin k (x) 2 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Linear regression Indicator response matrix 3 1 g = 4 , . . . 2 0 1 Y = 0 . . . 0 0 1 0 0 0 0 1 0 0 0 1 0 ^ ^ ^ F = Y = X(XT X)-1 XT Y = XB 3 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani ^ f1 (x) ^ f2 (x) ^ ^ f (x) = B T x = . . . ^ fK (x) Targets: N p (x) 1 p2 (x) Note: E(Y |X = x) . . . pK (x) where pk (x) = P (G = k|X = x) minB i=1 ||yi - B T xi ||2 , yi , xi are ith rows of Y and X. ^ ^ ^ ^ with f (x) = B T x, G(x) = argmink ||f (x) - tk ||2 , where tk = (0, 0, . . . 0, 1, 0, . . .) (1 in kth position). 4 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Masking problems with linear regression Linear Regression 3 3 3 3 33 3 33 33 3 3 3 3 3 3 33 33 3 3 3 3 33 3 3 33 3 3 3 3 3 3 33 3 33333 33 3 3 3 33 3 33 33 33 3 33 33333 333333 3 3 3 3 33 3 3 3 33 33 3 3 33 3 3 3 3 333333 3 3333 3 33 3 3 3 3 3 3 33333 3 3333 33 3 3 3 3 3 3 33 33 3 3 33 33 3 33 333 3 3 333 3 3 33 3 33 333 3 3 33 33 3 3 3 33 3 3 33 3333 3 3 33333333 3 333 3 3 33 3 3 33 3333333 3 3 3 3 33 3 3 33 3 3 3 3 3 3 3 3 3 3333 3 3 33333 3 3 33 33 3 3 3 3 3 3 3 33 3 3 3 33 3 33 3 3 3 3 33 3 3 3 33 3 3 3 33 3 33 3 3333 3 3333 33 3 3 3 3 33 3 333333 3333 3 33 3 3 3 3 3 333 333333 3 33 3 3 3 3 3 3 3 3 3 3 33 3 3 333 33 33 333 33 3 33 3 3 3 3 33 33 3 3 3 3 3 3 3 3 3 333 3 2 3 3 3 333 3 333 3 33 3 3 3 3 3 3 33 333 33 3 3 33 3 33 33 3 33 3 22 3 3 3 3 33 3 3 33 3 3 3 3 2 3 3 3 3 3 3 3 33 3 2 2 2 3 2 2 23 33 2 3 2 2 2 2 2 222 2 2 222 22222 2 2 3 2 2 2 22 2 22 2 2 2 2 2 22 222 2222 22 2 2 2 2 3 2 2 2 22 2 2 2 222 2 22 2 2 2 22 2 2 2 22222 2 2 2 2 2 22 2 2 2 2 2 22 22 222 2 2 2 2 2 2 22 2 22 2 22 2 2222 222 2 2 2 222 2222 22222 2 2 22 22 2 2 2 2 2 22 2 2 2 2 2222 2 2 2 2 2 2 2 222 222 2222 222 2 22 2 22 2 22 2 2 22 2 2 22 2 22 2 2 2 22 2 2 2222 22 2222222 22222 2 22 2 22 2 222 222 22 2 2 2 2 2 222 2222 2 2 22 222 2 2 22 22 2 2 2 2 22 2222 2222 22222 2 22 2 2 2 2 22 2 22 2 22 2 2 2 2 222 222 2 2 22 2 2 222 2 2 2 22 2 2 2 2 22 2 22 2 2 22 2 22 2 2 2 22 22 2 222 2 22 2 2 2222 222 22 2 2 2 2 2 2 2 22 22 22 2 1 22 2 222 2222 2 22 2 2 2 2 2 2 22 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2 22 2 2 2 2 1 1 11 2 2 22222 22 2 11 1 22 2 2 2 1 1 11 2 2 2 2 22 2 11 1 1 1 1 1 1 1 1 1 1 111 1 1 111 11 1 1 1 2 2 2 1 11 1 1 11 1 1 1 1 1 11 1 1 1 1 11 1 1 1 1 1 1 11111 11 111 1111111 1 1 1 1 11 11 1 1 11 1 1 1 1 11 1 1 11 1 11111111 1 1 1 111 1 1 1 11 1 11 1 1 1 1 1 1111 111 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1 11 11111 1 1 1 1 1 1 111 1 1 1 1 11 1111 1 111 1 11 1 1 1 1 1111 1 1 1111 1 11 1 1 1 1 1 1 11111111 1 1 1 1 1 1 1 11 1 1 1 1 1 1 111 11 1 1 11111111 1111 11 11 11 1 1 1 1 1 11 1 1 1 11 1 1 111 1 1 1 111 1 11 1 1 1 111 1 1 11 1111 11 11111 1 1 1 1 1 1 1 1 1 11 1 1 1 11 1 1 11 1 1 1 1 11 11 11111 1 1 1 1 11 1 1 1 1111 11 1 1 1 1 1 1 1 11111111 1 11 1 1 1 11 1 1 111 11 1 11 1 11 1 11 1 11 1 1 1 1 1 11 1 111 1 1 1 1 1 1 1 1 1 1 111 1 1 1 111 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 111 1 1 1 1 3 1 Linear Discriminant Analysis 3 3 3 3 33 3 33 33 3 3 3 3 3 3 33 33 3 3 3 3 33 3 3 33 3 3 3 3 3 3 33 3 33333 33 3 3 3 33 3 33 33 33 3 33 33333 333333 3 3 3 3 33 3 3 3 33 33 3 3 33 3 3 3 3 333333 3 3333 3 33 3 3 3 3 3 3 33333 3 3333 33 3 3 3 3 3 3 33 33 3 3 33 33 3 33 333 3 3 333 3 3 33 3 33 333 3 3 33 33 3 3 3 33 3 3 33 3333 3 3 33333333 3 333 3 3 33 3 3 33 3333333 3 3 3 3 33 3 3 33 3 3 3 3 3 3 3 3 3 3333 3 3 33333 3 3 33 33 3 3 3 3 3 3 3 33 3 3 3 33 3 33 3 3 3 3 33 3 3 3 33 3 3 3 33 3 33 3 3333 3 3333 33 3 3 3 3 33 3 333333 3333 3 33 3 3 3 3 3 333 333333 3 33 3 3 3 3 3 3 3 3 3 3 33 3 3 333 33 33 333 33 3 33 3 3 3 3 33 33 3 3 3 3 3 3 3 3 3 333 3 2 3 3 3 333 3 333 3 33 3 3 3 3 3 3 33 333 33 3 3 33 3 33 33 3 33 3 22 3 3 3 3 33 3 3 33 3 3 3 3 2 3 3 3 3 3 3 3 33 3 2 2 2 3 2 2 23 33 2 3 2 2 2 2 2 222 2 2 222 22222 2 2 3 2 2 2 22 2 22 2 2 2 2 2 22 222 2222 22 2 2 2 2 3 2 2 2 22 2 2 2 222 2 22 2 2 2 22 2 2 2 22222 2 2 2 2 2 22 2 2 2 2 2 22 22 222 2 2 2 2 2 2 22 2 22 2 22 2 2222 222 2 2 2 222 2222 22222 2 2 22 22 2 2 2 2 2 22 2 2 2 2 2222 2 2 2 2 2 2 2 222 222 2222 222 2 22 2 22 2 22 2 2 22 2 2 22 2 22 2 2 2 22 2 2 2222 22 2222222 22222 2 22 2 22 2 222 222 22 2 2 2 2 2 222 2222 2 2 22 222 2 2 22 22 2 2 2 2 22 2222 2222 22222 2 22 2 2 2 2 22 2 22 2 22 2 2 2 2 222 222 2 2 22 2 2 222 2 2 2 22 2 2 2 2 22 2 22 2 2 22 2 22 2 2 2 22 22 2 222 2 22 2 2 2222 222 22 2 2 2 2 2 2 2 22 22 22 2 1 22 2 222 2222 2 22 2 2 2 2 2 2 22 2 2 2 2 2 2 2 1 1 2 1 2 2 2 2 2 2 22 2 2 2 2 1 1 11 2 2 22222 2 22 2 11 1 2 2 2 2 1 1 11 2 2 2 2 22 2 11 1 1 1 1 1 1 1 1 1 1 111 1 1 111 11 1 1 1 2 2 2 1 11 1 1 11 1 1 1 1 1 11 1 1 1 1 11 1 1 1 1 1 1 11111 11 111 1111111 1 1 1 1 1 11 11 1 11 1 1 1 1 11 1 1 1 1 11 1 11111111 1 111 1 1 11 1 11 1 1 1 1 1 1111 111 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 11111 1 1 1 1 1 1 111 1 1 1 1 11 1111 1 111 1 11 1 1 1 1111 1 1 1 1111 1 11 1 1 1 1 1 1 11111111 1 1 1 1 1 1 1 11 1 1 1 1 1 1 111 11 1 1 11111111 1111 11 11 11 1 1 1 1 1 11 1 1 1 11 1 1 111 1 1 1 111 1 11 1 1 1 111 1 1 11 1111 11 11111 1 1 1 1 1 1 1 1 1 11 1 1 1 11 1 1 11 1 1 1 1 11 11 11111 1 1 1 1 11 1 1 1 1111 11 1 1 1 1 1 1 1 11111111 1 11 1 1 1 11 1 11 1 111 1 11 1 11 1 11 1 11 1 1 1 1 1 11 1 111 1 1 1 1 1 1 1 1 1 1 111 1 1 1 111 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 111 1 1 1 1 3 1 3 3 X2 X1 X2 X1 The data come from three classes in R2 and are easily separated by linear decision boundaries. The right plot shows the boundaries found by linear discriminant analysis. The left plot shows the boundaries found by linear regression of the indicator response variables. The middle class is completely masked (never dominates). 5 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Degree = 1; Error = 0.25 1 1 11 11 11 1 1 3 33 3 33 33 3 33 33 33 3 33 33 33 3 33 3 3 3 1 1 1 1 1 1 1 1 Degree = 2; Error = 0.03 1.0 11 1 3 3 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 33 3 33 3 1 11 11 11 11 11 1 1 11 1 11 1.0 1 1 1 1 1 11 1 11 1 1 11 1 11 1 11 22 2 22 2 2 2 22 2 2 2 2 2 2 2 2 2 222 2222222 2 22 2 2222222 2 2 2 22 222 22 2 2 11 1 11 1 11 11 3 33 33 3 33 3 33 0.5 2 22 2 2 22 2 2 2 22 2 2 2222 2 2 22 22 22 2 2222222 22 0.0 3 3 33 33 33 3 3 33 3 33 3 33 3 33 33 3 33 33 3 3 33 33 11 333 11 3 3 1 3 22 2 2 222222 2 2 2 2 22222222 2222 2 11 2 11 22 232 2323 2 22 22322222 2 2 2222 2222222 222 2 22 22 222222 321 3 3 11 33 111 3 11 3 3 3 11 33 3 11 1 3 1 11 11 1 1 11 1 11 11 11 11 11 11 11 11 11 1 11 1 1 1 2 1 2 3 0.5 3 0.0 33 333 3 33 2 3 3 3 3333 3 3 33 3 3 2 3 3333333 33 33 3 3 2 2 2 2 2 2 2 2 2 2 22 3 33 11 1 2 33 2 22 11 11 3 2 2 2 11 333 11 333 11 3 2 2 131 11 311 2 1 3 2 33 1 2 33 111 33 2 3 2 111 1 33 2 1 1 1 1 1 333 3 3 1 1 1 11 111111 11121111 1 11111112 1 11 1 11 11 2 2 2 2 2 2 2 2 3 22 33 33 22 2 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 The effects of masking on linear regression in R for a three-class problem. The rug plot at the base indicates the positions and class membership of each observation. The three curves in each panel are the fitted regressions to the three-class indicator variables; for example, for the green class, ygreen is 1 for the green observations, and 0 for the orange and blue. The fits are linear and quadratic polynomials. Above each plot is the training error rate. The Bayes error rate is 0.025 for this problem, as is the LDA error rate. 6 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Linear discriminant analysis fk (x) -- density of X in class G = k k -- class prior P r(G = k). Bayes theorem P r(G = k|X = x) = fk (x)k K =1 f (x) leads to LDA, QDA, MDA (mixture DA), Kernel DA, Naive Bayes LDA: fk (x) P r(G = k|x) log P r(G = |x) (2)p/2 || k 1 = log - (k + )T -1 (k - ) + xT -1 (k - ) 2 = 1 e- 2 (x-k ) 1/2 1 T -1 (x-k ) 7 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani More on LDA estimate k by centroid in class k, and by pooled within class covariance matrix estimated Bayes rule: classify to class k that maximizes the discriminant function 1 ^ ^ ^ ^k ^ ^ k (x) = xT -1 k - T -1 k + log k 2 for two classes, we classify to class 2 if ^ xT -1 (^2 - 1 ) > ^ 1 N2 ^ (^2 + 1 )T -1 (^2 - 1 ) - log ^ ^ 2 N1 where N1 , N2 are number of observations in each class. 8 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani + + + + Although the line joining the centroids defines the direction of greatest centroid spread, the projected data overlap because of the covariance (left panel). The discriminant direction minimizes this overlap for Gaussian data (right panel). 9 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Linear boundaries and their projections 1 2 1 1 2 1 2 3 2 2 1 2 2 2 3 2 2 2 3 3 22 2 3 33 2 33 12 3 2 2 3 2 2 2 22 2 3 3 2 3 2 222 2 1 2 2 3 22 2 1 2 2 2 3 3 2 2 22 2 2 2 2 2 22 2 2 2 2 2 22 22 2 22 22 22 2 22 3 3 3 33 2 2 2 222 2 22 22 2 1 3 33333 3 22 2 2 22 2 3 33 3 3 3 3 3 2 2 22 2 2 2 2 3 3 33 2 22 2 2 2 22 2 2 2 2 3 2 3 2 2 22 2 2 22 2222 22 2 13 3 2 2 22 2 2 2222 2 3 333 3 3 2 22 1 2 2 222 22 1 22 2 2 2 2 2 1 33 33 3 33 212 2 22 1 2 2 1 1 2 1 1 1 1 3 33 3 2 2 2 2 2 1 22 22 2 2 1 1 1 1 1 33 33 3 22 12 2 2 1 1 33 3 3 3 2 3 11 1 2 1 11 1 11 2 1 21 1 1 33 33 1 2 1 2 21 1 2 33 33 11 1 1 1 1 1 1 3 33 2 2 12 1 1 11 22 1 1 1 1 21 3 33 3 1 1 1 1 1 11 1 1133 3 3 3 3333 3 1 3 3 2 111 2 1 2 1 1 3 3 33 1 1 111 1 33 3 1 1 33 33 3 1 1 1 11 1 1 1 1 1 1 1 3 1 3 33 3 1 1 1 1 1 1 1 1 11 1 11 11 11 1 1 11 11 1 111 33 133 3 1 3 333333 3 11 1 3 1 1 1 11 3 1 1 1 1 11 1 1 1 1 1 33 3333 3 3 1333 1 1 1 11 1 1 1 33 3 33 11 1 1 1 1 11 1 1 1 33 1 11 1 11 1 33 3 3 3333 11 3 3 3333 3 3 3 1 1 1 1 1 1 1 3 333 1 1 1 3 3 33 1 1 1 333 3 3 3 3 1 2 1 1 2 1 2 3 2 2 1 2 2 2 3 2 2 2 3 3 22 2 3 33 2 33 12 3 2 2 3 2 2 2 22 2 3 3 2 3 2 222 2 1 2 2 3 22 2 1 2 2 2 3 3 2 2 22 2 2 2 2 2 22 2 2 2 2 2 22 22 2 22 22 22 2 22 3 3 3 33 2 2 2 222 2 22 22 2 1 3 33333 3 22 2 2 22 2 3 33 3 3 3 3 3 2 2 22 2 2 2 2 3 3 33 2 22 2 2 2 22 2 2 2 2 3 2 3 2 2 22 2 2 22 2222 22 2 13 3 2 2 22 2 2 2222 2 3 333 3 3 2 22 1 2 2 222 22 1 22 2 2 2 2 2 1 33 33 3 33 212 2 22 1 2 2 1 1 2 1 1 1 1 3 33 3 2 2 2 2 2 1 22 22 2 2 1 1 1 1 1 33 33 3 22 12 2 2 1 1 33 3 3 3 2 3 11 1 2 1 11 1 11 2 1 21 1 1 33 33 1 2 1 2 21 1 2 33 33 11 1 1 1 1 1 1 3 33 2 2 12 1 1 11 22 1 1 1 1 21 3 33 3 1 1 1 1 1 11 1 1133 3 3 3 3333 3 1 3 3 2 111 2 1 2 1 1 3 3 33 1 1 111 1 33 3 1 1 33 33 3 1 1 1 11 1 1 1 1 1 1 1 3 1 3 33 3 1 1 1 1 1 1 1 1 11 1 11 11 11 1 1 11 11 1 111 33 133 3 1 3 333333 3 11 1 3 1 1 1 11 3 1 1 1 1 11 1 1 1 1 1 33 3333 3 3 1333 1 1 1 11 1 1 1 33 3 33 11 1 1 1 1 11 1 1 1 33 1 11 1 11 1 33 3 3 3333 11 3 3 3333 3 3 3 1 1 1 1 1 1 1 3 333 1 1 1 3 3 33 1 1 1 333 3 3 3 3 2 2 The left plot shows some data from three classes, with linear decision boundaries found by linear discriminant analysis. The right plot shows quadratic decision boundaries. These were obtained by finding linear 2 2 boundaries in the five-dimensional space X1 , X2 , X1 X2 , X1 , X2 . Linear inequalities in this space are quadratic inequalities in the original space. 10 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Quadratic discriminant analysis 1 1 k (x) = - log |k | - (x - k )T -1 (x - k ) + log k k 2 2 1 2 1 1 2 1 3 22 1 2 22 2 3 2 2 2 3 3 22 2 3 33 2 33 12 3 2 2 3 2 2 2 22 2 3 3 2 3 2 222 2 1 2 2 3 22 2 1 2 2 2 3 3 2 2 22 2 2 2 2 2 22 2 2 2 2 2 22 22 2 22 22 22 2 22 3 3 3 33 2 2 2 222 2 22 22 2 1 3 33333 3 22 2 2 22 2 3 33 3 3 3 3 3 2 2 22 2 2 2 2 3 3 33 2 22 2 2 2 22 2 2 2 2 3 2 3 2 2 22 2 2 22 2 222 22 2 13 3 2 2 2 22 2 2 2 2 2 2 2 3 333 3 3 2 1 2 22 1 2 22222 2 2 22 2 1 33 33 3 33 22 2 22 12 1 2 2 1 1 2 1 3 33 3 2 2 22 22 1 1 1 2 1 2 1 22 1 1 1 33 33 3 22 1 2 2 2 1 22 1 1 1 33 3 3 3 2 11 1 11 2 11 1 1 1 21 1 1 3 33 3 3 3 2 1 2 21 1 2 3 1 1 1 1 2 2 1 1 1 11 1 11 1 1 11 1 1 1 33 3 333 12 22 21 31333 3 11 1 1 3 33 1 11 2 33 3 2 111 1 1 2 1 1 33333 33 1 1 1 11 1 33 3 1 33 33 3 1 1 1 11 1 1 1 1 1 1 1 3 1 3 33 3 1 1 1 1 1 1 1 1 11 1 11 11 11 1 1 11 11 1 111 33 133 3 1 33 33 3 33 3 11 1 1 1 1 11 33 1 1 1 1 11 1 1 1 1 1 33 3333 3 3 1333 1 1 1 1 1 1 1 33 3 33 11 1 1 1 1 11 1 1 1 1 33 1 11 1 11 1 33 3 3 3333 11 3 3 3333 3 33 3 1 1 1 1 1 1 1 3 33 1 1 1 3 3 33 1 1 1 333 3 3 3 3 1 2 1 1 2 1 3 22 1 2 22 2 3 2 2 2 3 3 22 2 3 33 2 33 12 3 2 2 3 2 2 2 22 2 3 3 2 3 2 222 2 1 2 2 3 22 2 1 2 2 2 3 3 2 2 22 2 2 2 2 2 22 2 2 2 2 2 22 22 2 22 22 22 2 22 3 3 3 33 2 2 2 222 2 22 22 2 1 3 33333 3 22 2 2 22 2 3 33 3 3 3 3 3 2 2 22 2 2 2 2 3 3 33 2 22 2 2 2 22 2 2 2 2 3 2 3 2 2 22 2 2 22 2 222 22 2 13 3 2 2 2 22 2 2 2 2 2 2 2 3 333 3 3 2 1 2 22 1 2 22222 2 2 22 2 1 33 33 3 33 22 2 22 12 1 2 2 1 1 2 1 3 33 3 2 2 22 22 1 1 1 2 1 2 1 22 1 1 1 33 33 3 22 1 2 2 2 1 22 1 1 1 33 3 3 3 2 11 1 11 2 11 1 1 1 21 1 1 3 33 3 3 3 2 1 2 21 1 2 3 1 1 1 1 2 2 1 1 1 11 1 11 1 1 11 1 1 1 33 3 333 12 22 21 31333 3 11 1 1 3 33 1 11 2 33 3 2 111 1 1 2 1 1 33333 33 1 1 1 11 1 33 3 1 33 33 3 1 1 1 11 1 1 1 1 1 1 1 3 1 3 33 3 1 1 1 1 1 1 1 1 11 1 11 11 11 1 1 11 11 1 111 33 133 3 1 33 33 3 33 3 11 1 1 1 1 11 33 1 1 1 1 11 1 1 1 1 1 33 3333 3 3 1333 1 1 1 1 1 1 1 33 3 33 11 1 1 1 1 11 1 1 1 1 33 1 11 1 11 1 33 3 3 3333 11 3 3 3333 3 33 3 1 1 1 1 1 1 1 3 33 1 1 1 3 3 33 1 1 1 333 3 3 3 3 2 2 Two methods for fitting quadratic boundaries. [Left] Quadratic decision boundaries, obtained using LDA in the five-dimensional "quadratic" space. [Right] Quadratic decision boundaries found by QDA. The differences are small, as is usually the case. 11 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani + + + 3 3 3 3 1 3 3 2 2 13 2 1 1 3 33 2 3 3 32 2 1 3 3 2 3 33 3 2 2 1 2 33 1 2 1 2 22 2 3 2 3 11 1 1 1 3 2 1 31 1 3 1 11 2 22 1 1 22 1 1 2 2 1 2 1 1 1 1 2 1 2 2 2 2 1 33 3 3 The left panel shows three Gaussian distributions, with the same covariance and different means. Included are the contours of constant density enclosing 95% of the probability in each case. The Bayes decision boundaries between each pair of classes are shown (broken straight lines), and the Bayes decision boundaries separating all three classes are the thicker solid lines (a subset of the former). On the right we see a sample of 30 drawn from each Gaussian distribution, and the fitted LDA decision boundaries. 12 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Regularized discriminant analysis ^ ^ ^ Regularized QDA k () = k + (1 - ) ^ ^ Regularized LDA () = + (1 - )^ 2 I ^ Together (, ) ^ ^ ^ could use () = + (1 - )diag() in "Nearest Shrunken Centroid" work we use p k (x) = j=1 (xj - )2 ^jk 1 - log k s2 2 j where is a shrunken centroid. Details later. ^jk 13 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Regularized Discriminant Analysis on the Vowel Data 0.5 Misclassification Rate 0.4 0.3 0.2 Test Data Train Data 0.1 0.0 1.0 0.0 0.2 0.4 0.6 0.8 Test and training errors for the vowel data, using regularized discriminant analysis with a series of values of [0, 1]. The optimum for the test data occurs around = 0.9, close to quadratic discriminant analysis. 14 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani 15 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Classification in high dimensions important for gene expression microarray problems and other genomics problems ^ Starting point: diagonal LDA which uses diag() nearest centroid classification on standardized features is equivalent to diagonal LDA nearest shrunken centroids regularizes further, by discarding noisy features 16 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Classification of microarray samples Example: small round blue cell tumors; Khan et al, Nature Medicine, 2001 Tumors classified as BL (Burkitt lymphoma), EWS (Ewing), NB (neuroblastoma) and RMS (rhabdomyosarcoma). There are 63 training samples and 25 test samples, although five of the latter were not SRBCTs. 2308 genes Khan et al report zero training and test errors, using a complex neural network model. Decided that 96 genes were "important". Too complicated! 17 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani BL EWS NB RMS Khan data 18 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Neural network approach 19 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Class centroids BL EWS NB RMS Gene 0 -1.0 -0.5 500 1000 1500 2000 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Centroids: Average Expression Centered at Overall Centroid 20 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Shrunken centroids Idea: shrink each class centroid towards the overall centroid. First normalize by the within-class standard deviation for each gene. Let xij be the expression for samples i = 1, 2, . . . n and genes j = 1, 2, . . . p. We have classes 1, 2, . . . K, and let Ck be indices of the nk samples in class k. The jth component of the centroid for class k is xjk = iCk xij /nk , the mean expression value in class k for gene j; the jth component of the overall centroid is xj = n xij /n. i=1 21 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Let djk = (jk - xj )/sj , x where sj is the pooled within class standard deviation for gene j: s2 = j 1 n-K (xij - xjk )2 . k iCk (1) (2) Shrink each djk towards zero, giving d and new shrunken centroids jk or prototypes x = xj + sj d jk jk (3) 22 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani (0,0) The shrinkage is soft-thresholding: each djk is reduced by an amount in absolute value, and is set to zero if its absolute value is less than zero. Algebraically, this is expressed as d = sign(djk )(|djk | - )+ jk (4) where + means positive part (t+ = t if t > 0, and zero otherwise). Choose by cross-validation. 23 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Advantages Simple, includes nearest centroid classifier as a special case. Thresholding denoises large effects, and sets small ones to zerothereby selecting genes with more than two classes, method can select different genes, and different numbers of genes for each class. 24 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Class probabilities For a test sample x = (x , x , . . . x ), we define the discriminant 1 2 p score for class k p k (x ) = j=1 jk (x - x )2 j - 2 log k 2 sj (5) The classification rule is then C(x ) = if (x ) = min k (x ) k (6) estimates of the class probabilities, by analogy to Gaussian linear discriminant analysis, are pk (x ) = ^ e- 2 k (x 1 ) 1 K e- 2 (x ) =1 (7) 25 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Results on Khan data At optimal point, there are 43 active genes Size 2308 2188 1668 1020 598 339 206 133 81 52 34 22 15 10 8 5 1 tr cv te te Error 0.4 te cv tr te 0.2 te tr te cv cv te te te te te te cv tr tr tr cv tr cv tr cv te te te te te cv te tr 0.8 te te 0.6 cv 0.0 0 2 4 6 Amount of Shrinkage Delta 26 ESL Chapter 4 -- Linear Methods for Classification Trevor Hastie and Rob Tibshirani Training Data...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern