This preview shows page 1. Sign up to view the full content.
Unformatted text preview: k for gene
¯
j ; the j th component of the overall centroid is xj = n xij /n.
¯
i=1 21 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani • Let
djk = (¯jk − xj )/sj ,
x
¯ (1) where sj is the pooled within class standard deviation for gene j :
s2 =
j 1
n−K (xij − xjk )2 .
¯ (2) k i∈Ck • Shrink each djk towards zero, giving dj k and new shrunken centroids
or prototypes
xj k = xj + sj dj k
¯
¯ (3) 22 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani ∆ (0,0) • The shrinkage is softthresholding: each djk is reduced by an amount
∆ in absolute value, and is set to zero if its absolute value is less than
zero. Algebraically, this is expressed as
dj k = sign(djk )(djk  − ∆)+ (4) where + means positive part (t+ = t if t > 0, and zero otherwise).
• Choose ∆ by crossvalidation.
23 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Connection to Lasso
Exercise 18.12 in ESL:
Consider a (naive Bayes) Gaussian model for classiﬁcation in which the
features j = 1, 2, . . . , p are assumed to be independent within each class
k = 1, 2, . . . , K . With observations i = 1, 2, . . . , N and Ck equal to the
set of indices of the Nk observations in class k , we observe
2
xij ∼ N (µj + µjk , σj ) for i ∈ Ck with K µjk = 0.
k=1
Set σj = s2 , the pooled withinclass variance for feature j , and consider
ˆ2
j
the lassostyle minimization problem p
K
1 p K
2
(xij − µj − µjk )
µjk  + λ Nk
.

min
2
2
sj
sj {µj ,µjk }
j =1
j =1
k=1 i∈Ck k=1 The solution is equivalent to the nearest shrunken centroid classiﬁer. 24 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Advantages
• Simple, includes nearest centroid classiﬁer as a special case.
• Thresholding denoises large effects, and sets small ones to zerothereby selecting genes
• with more than two classes, method can select different g...
View
Full
Document
This document was uploaded on 03/10/2014 for the course STATS 315A at Stanford.
 Spring '10
 TIBSHIRANI,R
 Statistics, Linear Regression

Click to edit the document details