0 1 1 1 1 1 11 1 11 1 1 11 1 11 1 11 22 2 22 2 2 2 22

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 1 1.0 1 1 1 1 1 11 1 11 1 1 11 1 11 1 11 22 2 22 2 2 2 22 2 2 2 2 2 2 2 2 2 0.5 3 33 333 3 33 2 33 3 3333 3 3 33 3 3 2 3 3333333 33 33 3 3 2 0.0 2 2 2 2 2 2 2 2 2 22 2222222 2 222 2 2222222 2 2 2 22 222 22 2 2 2 3 3 22 33 33 22 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 33 3 33 3 2 22 3 33 11 1 2 33 2 22 11 11 3 2 2 2 11 333 11 333 11 3 2 2 131 11 311 2 1 3 2 33 1 2 33 111 33 2 3 2 111 1 33 2 1 11 11 333 33 11 1 11 111111 11121111 1 11111112 1 11 1 11 11 2 2 2 2 2 2 2 2 1 2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 The effects of masking on linear regression in R for a three-class problem. The rug plot at the base indicates the positions and class membership of each observation. The three curves in each panel are the fitted regressions to the three-class indicator variables; for example, for the green class, ygreen is 1 for the green observations, and 0 for the orange and blue. The fits are linear and quadratic polynomials. Above each plot is the training error rate. The Bayes error rate is 0.025 for this problem, as is the LDA error rate. 6 ESL Chapter 4 — Linear Methods for Classification Trevor Hastie and Rob Tibshirani Linear discriminant analysis • fk (x) — density of X in class G = k • πk — class prior P r (G = k ). • Bayes theorem fk (x)πk P r (G = k |X = x) = K =1 f (x)π • leads to LDA, QDA, MDA (mixture DA), Kernel DA, Naive Bayes • LDA: fk (x) = P r (G = k |x) log P r (G = |x) = 1 1 e− 2 (x−µk ) 1/2 T Σ−1 (x−µk ) (2π )p/2 |Σ| πk 1 log − (µk + µ )T Σ−1 (µk − µ ) + xT Σ−1 (µk − µ ) π 2 7 ESL Chapter 4 — Linear Methods for Classification Trevor Hastie and Rob Tibshirani More on LDA • estimate µk by centroid in class k , and Σ by pooled within class covariance matrix • estimated Bayes rule: classify to class k that maximizes the discriminant function 1ˆ ˆˆ ˆk ˆ ˆ δk (x) = xT Σ−1 µk − µT Σ−1 µk + log πk 2 • for two classes, we classify to class 2 if ˆ xT Σ−1 (ˆ2 − µ1 ) > µ ˆ 1 N2 ˆ (ˆ2 + µ1 )T Σ−1 (ˆ2 − µ1 ) − log µ ˆ µ ˆ 2 N1 where N1 , N2 are number of observations in each class. 8 ESL Chapter 4 — Linear Methods for Classification Trevor Hastie and Rob Tibshirani +...
View Full Document

This document was uploaded on 03/10/2014 for the course STATS 315A at Stanford.

Ask a homework question - tutors are online