# Algebraically this is expressed as dj k signdjk djk

Unformatted text preview: k for gene ¯ j ; the j th component of the overall centroid is xj = n xij /n. ¯ i=1 21 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani • Let djk = (¯jk − xj )/sj , x ¯ (1) where sj is the pooled within class standard deviation for gene j : s2 = j 1 n−K (xij − xjk )2 . ¯ (2) k i∈Ck • Shrink each djk towards zero, giving dj k and new shrunken centroids or prototypes xj k = xj + sj dj k ¯ ¯ (3) 22 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani ∆ (0,0) • The shrinkage is soft-thresholding: each djk is reduced by an amount ∆ in absolute value, and is set to zero if its absolute value is less than zero. Algebraically, this is expressed as dj k = sign(djk )(|djk | − ∆)+ (4) where + means positive part (t+ = t if t > 0, and zero otherwise). • Choose ∆ by cross-validation. 23 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Connection to Lasso Exercise 18.12 in ESL: Consider a (naive Bayes) Gaussian model for classiﬁcation in which the features j = 1, 2, . . . , p are assumed to be independent within each class k = 1, 2, . . . , K . With observations i = 1, 2, . . . , N and Ck equal to the set of indices of the Nk observations in class k , we observe 2 xij ∼ N (µj + µjk , σj ) for i ∈ Ck with K µjk = 0. k=1 Set σj = s2 , the pooled within-class variance for feature j , and consider ˆ2 j the lasso-style minimization problem p K 1 p K 2 (xij − µj − µjk ) µjk | + λ Nk . | min 2 2 sj sj {µj ,µjk } j =1 j =1 k=1 i∈Ck k=1 The solution is equivalent to the nearest shrunken centroid classiﬁer. 24 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Advantages • Simple, includes nearest centroid classiﬁer as a special case. • Thresholding denoises large effects, and sets small ones to zerothereby selecting genes • with more than two classes, method can select different g...
