This preview shows page 1. Sign up to view the full content.
Unformatted text preview: r 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani 15 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Classiﬁcation in high dimensions
• important for gene expression microarray problems and other
genomics problems
ˆ
• Starting point: diagonal LDA which uses diag(Σ)
• nearest centroid classiﬁcation on standardized features is equivalent
to diagonal LDA
• nearest shrunken centroids regularizes further, by discarding noisy
features 16 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Classiﬁcation of microarray samples
Example: small round blue cell tumors; Khan et al, Nature Medicine,
2001
• Tumors classiﬁed as BL (Burkitt lymphoma), EWS (Ewing), NB
(neuroblastoma) and RMS (rhabdomyosarcoma).
• There are 63 training samples and 25 test samples, although ﬁve of
the latter were not SRBCTs. 2308 genes
• Khan et al report zero training and test errors, using a complex neural
network model. Decided that 96 genes were “important”.
• Too complicated! 17 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani BL EWS NB RMS Khan data
18 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Neural
network
approach
19 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Class centroids
EWS NB RMS 0 500 1000 Gene 1500 2000 BL −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 Centroids: Average Expression Centered at Overall Centroid 20 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Shrunken centroids
• Idea: shrink each class centroid towards the overall centroid. First
normalize by the withinclass standard deviation for each gene.
• Let xij be the expression for samples i = 1, 2, . . . n and genes
j = 1, 2, . . . p.
• We have classes 1, 2, . . . K , and let Ck be indices of the nk samples
in class k .
• The j th component of the centroid for class k is
xjk = i∈Ck xij /nk , the mean expression value in class...
View
Full
Document
This document was uploaded on 03/10/2014 for the course STATS 315A at Stanford.
 Spring '10
 TIBSHIRANI,R
 Statistics, Linear Regression

Click to edit the document details