This preview shows page 1. Sign up to view the full content.
Unformatted text preview: , with
ηj (x) ∼ log P (G = j x) = xT βj and
K
P (G = j x) = eηj (x) /
eη (x) .
=1
56 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Logistic regression or LDA?
• LDA:
Pr(G = j X = x)
log
Pr(G = K X = x) = πj
1
log
− (µj + µK )T Σ−1 (µj − µK )
πK
2
+xT Σ−1 (µj − µK ) = T
αj 0 + αj x. This linearity is a consequence of the Gaussian assumption for the
class densities, as well as the assumption of a common covariance
matrix.
• Logistic model:
log Pr(G = j X = x)
T
= βj 0 + βj x.
Pr(G = K X = x) They use the same form for the logits
57 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani • Discriminative vs generative (informative) learning: logistic
regression uses the conditional distribution of Y given x to estimate
parameters, while LDA uses the full joint distribution (assuming
normality).
Pr(X, G = j ) = Pr(X )Pr(G = j X ),
• If normality holds, LDA is up to 30% more efﬁcient (Efron 1975);
o/w logistic regression can be more robust. But the methods are
similar in practice.
• The additional efﬁciency is obtained from using observations far
from the decision boundary to help estimate Σ (dubious!) 58 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani Naive Bayes Models
Suppose we estimate the class densities f1 (X ) and f2 (X ) for the features
in class 1 and 2 respectively.
Bayes Formula tells us how to convert these to class posterior
probabilities:
f1 (X )π1
,
Pr(Y = 1X ) =
f1 (X )π1 + f2 (X )π2
where π1 = Pr(Y = 1) and π2 = 1 − π1 .
Since X is often high dimensional, the following within class
independence model is convenient:
p fj (X ) ≈ fjm (Xm )
m=1 Works for more than two classes as well.
59 ESL Chapter 4 — Linear Methods for Classiﬁcation Trevor Hastie and Rob Tibshirani • Each of the component densities fjm are estimated separately within
each class:
– Discrete components via histograms
– quantitative components via Gaussians or smooth density
estimates.
• The nearest shrunken centroids model has this structure, and in
addition
– assu...
View
Full
Document
This document was uploaded on 03/10/2014 for the course STATS 315A at Stanford.
 Spring '10
 TIBSHIRANI,R
 Statistics, Linear Regression

Click to edit the document details