lda - Linear Discriminant Analysis Linear Discriminant...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Linear Discriminant Analysis Linear Discriminant Analysis Jia Li Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu http://www.stat.psu.edu/jiali Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Notation The prior probability of class k is k , k is usually estimated simply by empirical frequencies of the training set k = ^ # samples in class k Total # of samples K k=1 k = 1. The class-conditional density of X in class G = k is fk (x). Compute the posterior probability fk (x)k Pr (G = k | X = x) = K l=1 fl (x)l k k By MAP (the Bayes rule for 0-1 loss) ^ G (x) = arg max Pr (G = k | X = x) = arg max fk (x)k Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Class Density Estimation Linear and quadratic discriminant analysis: Gaussian densities. Mixtures of Gaussians. General nonparametric density estimates. Naive Bayes: assume each of the class densities are products of marginal densities, that is, all the variables are independent. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Linear Discriminant Analysis Multivariate Gaussian: fk (x) = 1 (2)p/2 | k |1/2 e - 2 (x-k ) 1 T -1 (x- ) k k Linear discriminant analysis (LDA): k = , k. The Gaussian distributions are shifted versions of each other. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Optimal classification ^ G (x) = arg max Pr (G = k | X = x) k = arg max fk (x)k = arg max log(fk (x)k ) k k = arg max - log((2)p/2 ||1/2 ) k 1 T -1 - (x - k ) (x - k ) + log(k ) 2 1 = arg max - (x - k )T -1 (x - k ) + log(k ) k 2 Note 1 1 1 - (x - k )T -1 (x - k ) = x T -1 k - T -1 k - x T -1 x 2 2 k 2 Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis To sum up ^ (x) = arg max x T -1 k - 1 T -1 k + log(k ) G k 2 k Define the linear discriminant function 1 k (x) = x T -1 k - T -1 k + log(k ) . 2 k Then ^ G (x) = arg max k (x) . k The decision boundary between class k and l is: {x : k (x) = l (x)} . Or equivalently the following holds k 1 log - (k + l )T -1 (k - l ) + x T -1 (k - l ) = 0. l 2 Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Binary classification (k = 1, l = 2): Define a0 = log 1 - 1 (1 + 2 )T -1 (1 - 2 ). 2 2 Define (a1 , a2 , ..., ap )T = -1 (1 - 2 ). p Classify to class 1 if a0 + j=1 aj xj > 0; to class 2 otherwise. An example: 1 = 2 = 0.5. 1 =,, 0)T , 2 = (2, -2)T . (0, 1.0 0.0 = . 0.0 0.5625 Decision boundary: 5.56 - 2.00x1 + 3.56x2 = 0.0 . Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Estimate Gaussian Distributions In practice, we need to estimate the Gaussian distribution. k = Nk /N, where Nk is the number of class-k samples. ^ k = gi =k x (i) /Nk . ^ (i) - )(x (i) - )T /(N - K ). ^ = K ^k ^k k=1 gi =k (x Note that x (i) denotes the ith sample vector. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Diabetes Data Set Two input variables computed from the principal components of the original 8 variables. Prior probabilities: 1 = 0.651, 2 = 0.349. ^ ^ 1 = (-0.4035, -0.1935)T , 2 = (0.7528, 0.3611)T . ^ ^ 1.7925 -0.1461 ^ = -0.1461 1.6634 Classification rule: 1 0.7748 - 0.6771x1 - 0.3929x2 0 ^ (x) = G 2 otherwise 1 1.1443 - x1 - 0.5802x2 0 = 2 otherwise http://www.stat.psu.edu/jiali Jia Li Linear Discriminant Analysis The scatter plot follows. Without diabetes: stars (class 1), with diabetes: circles (class 2). Solid line: classification boundary obtained by LDA. Dash dot line: boundary obtained by linear regression of indicator matrix. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Within training data classification error rate: 28.26%. Sensitivity: 45.90%. Specificity: 85.60%. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Contour plot for the density (mixture of two Gaussians) of the diabetes data. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Simulated Examples LDA is not necessarily bad when the assumptions about the density functions are violated. In certain cases, LDA may yield poor results. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis LDA applied to simulated data sets. Left: The true within class densities are Gaussian with identical covariance matrices across classes. Right: The true within class densities are mixtures of two Gaussians. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Left: Decision boundaries by LDA. Right: Decision boundaries obtained by modeling each class by a mixture of two Gaussians. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Quadratic Discriminant Analysis (QDA) Estimate the covariance matrix k separately for each class k, k = 1, 2, ..., K . Quadratic discriminant function: 1 1 k (x) = - log |k | - (x - k )T -1 (x - k ) + log k . k 2 2 Classification rule: ^ G (x) = arg max k (x) . k Decision boundaries are quadratic equations in x. QDA fits the data better than LDA, but has more parameters to estimate. http://www.stat.psu.edu/jiali Jia Li Linear Discriminant Analysis Diabetes Data Set Prior probabilities: 1 = 0.651, 2 = 0.349. ^ ^ 1 = (-0.4035, -0.1935)T , 2 = (0.7528, 0.3611)T . ^ ^ 1.6769 -0.0461 ^ 1 = -0.0461 1.5964 -0.3330 ^ 2 = 2.0087 -0.3330 1.7887 Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Within training data classification error rate: 29.04%. Sensitivity: 45.90%. Specificity: 84.40%. Sensitivity is the same as that obtained by LDA, but specificity is slightly lower. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis LDA on Expanded Basis 2 2 Expand input space to include X1 X2 , X1 , and X2 . 2 , X 2 ). Input is five dimensional: X = (X1 , X2 , X1 X2 , X1 2 Jia Li http://www.stat.psu.edu/jiali ^ = 1 = ^ -0.4035 -0.1935 0.0321 1.8363 1.6306 2 = ^ -0.6254 0.6073 3.5751 -1.1118 -0.5044 0.7528 0.3611 -0.0599 2.5680 1.9124 0.3548 -0.7421 -1.1118 12.3355 -0.0957 0.5215 1.2193 -0.5044 -0.0957 4.4650 1.7925 -0.1461 -0.6254 0.3548 0.5215 -0.1461 1.6634 0.6073 -0.7421 1.2193 Linear Discriminant Analysis Classification boundary: 2 2 0.651 - 0.728x1 - 0.552x2 - 0.006x1 x2 - 0.071x1 + 0.170x2 = 0 . If the linear function on the right hand side is non-negative, classify as 1; otherwise 2. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Classification boundaries obtained by LDA using the expanded 2 2 input space X1 , X2 , X1 X2 , X1 , X2 . Boundaries obtained by LDA and QDA using the original input are shown for comparison. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Within training data classification error rate: 26.82%. Sensitivity: 44.78%. Specificity: 88.40%. The within training data classification error rate is lower than those by LDA and QDA with the original input. Jia Li http://www.stat.psu.edu/jiali Linear Discriminant Analysis Jia Li http://www.stat.psu.edu/jiali ...
View Full Document

This note was uploaded on 02/04/2012 for the course STAT 557 taught by Professor Jiali during the Fall '09 term at Pennsylvania State University, University Park.

Ask a homework question - tutors are online