lda2 - Regularized Discriminant Analysis and Reduced-Rank...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Jia Li Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu http://www.stat.psu.edu/jiali Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis A compromise between LDA and QDA. Shrink the separate covariances of QDA toward a common covariance as in LDA. Regularized covariance matrices: ^ ^ ^ k () = k + (1 - ) . The quadratic discriminant function k (x) is defined using the ^ shrunken covariance matrices k (). The parameter controls the complexity of the model. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Computations for LDA Discriminant function: 1 1 ^ ^ k (x) = - log |k | - (x - k )T -1 (x - k ) + log k . k 2 2 ^ ^ Eigen-decomposition of k : k = Uk Dk UT . Dk is diagonal k with elements dkl , l = 1, 2, ..., p. Uk is p p orthonormal. ^ (x - k )T -1 (x - k ) ^ ^ k -1 = [UT (x - k )]T D-1 [UT (x - k )] k k k Jia Li ^ log |k | = http://www.stat.psu.edu/jiali = [Dk 2 UT (x - k )]T [Dk 2 UT (x - k )] k k l -1 log dkl . Regularized Discriminant Analysis and Reduced-Rank LDA ^ LDA, = UDUT : Sphere the data D- 2 UT X X and D- 2 UT k . k For the transformed data and class centroids, classify x to the closest class centroid in the transformed space, modulo the effect of the class prior probabilities k . 1 1 Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA The geometric illustration of LDA. Left: Original data in the two classes. The ellipsis represent the two estimated covariance matrices. Right: The class mean removed data and the estimated common covariance matrix. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA The geometric illustration of LDA. Left: The sphered mean removed data. Right: The sphered data in the two classes, the sphered means, and the decision boundary. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Reduced-Rank LDA Binary classification Decision boundary is given by the following linear equation: log 1 1 - (1 + 2 )T -1 (1 - 2 ) 2 2 +x T -1 (1 - 2 ) = 0 . Only the projection of X on the direction -1 (1 - 2 ) matters. If the data are sphered, only the projection of X on - is needed. 1 2 http://www.stat.psu.edu/jiali Jia Li Regularized Discriminant Analysis and Reduced-Rank LDA Suppose data are sphered. The subspace spanned by the K centroids is of rank K - 1, denoted by HK -1 . Data can be viewed in HK -1 without losing any information. When K > 3, we might want to find a subspace HL HK -1 optimal for LDA in some sense. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Optimization Criterion Fisher's optimization criterion: the projected centroids were spread out as much as possible comparing with variance. Find the linear combination Z = aT X such that the between-class variance is maximized relative to the within-class variance, where a = (a1 , a2 , ..., ap )T . Assume the within-class covariance matrix of X is W, i.e., the common covariance matrix of the classes. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA The between-class covariance matrix is B. Suppose k is a column vector denoting the mean vector of class k. = K k=1 K k=1 k k k (k - )(k - )T B = Note k is the percentage of class k samples in the entire data set. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA For the linear combination Z , the between-class variance is aT Ba and the within-class variance is aT Wa. Fisher's optimization becomes max a aT Ba . aT Wa 1 T Eigen-decomposition of W = VW DW VW . T 2 W = (W 2 )T W 2 , where W 2 = DW VW . 1 1 1 Define b = W 2 a, then a = W- 2 b. The optimization becomes b T (W- 2 )T BW- 2 b max b bT b 1 1 1 1 Jia Li Define B = (W- 2 )T BW- 2 . http://www.stat.psu.edu/jiali 1 1 Regularized Discriminant Analysis and Reduced-Rank LDA Eigen-decomposition of B = V DB VT . V = (v1 , v2 , ..., vp ). The maximization is achieved by b = v1 , the first eigen vector . of B Similarly, one can find the next direction b2 = v2 that is and maximizes b T B b /b T b . orthogonal to b1 = v1 2 2 2 2 Since a = W- 2 b, convert to the original problem, al = W- 2 vl . 1 1 The al (also denoted as vl in the textbook) are referred to as discriminant coordinates or canonical variates. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Summarization on obtaining discriminant coordinates: Find the centroids for all the classes. Find between-class covariance matrix B using the centroid vectors. ^ Find within-class covariance matrix W, i.e., . By eigen-decomposition T T 2 2 W = (W 2 )T W 2 = (DW VW )T DW VW . 1 1 1 1 Compute T B = (W- 2 )T BW- 2 = DW 2 VW BVW DW 2 . 1 1 -1 -1 Eigen-decomposition of B : B = V DB VT . The discriminant coordinates are: al = W- 2 vl . 1 Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Simulation Three classes with equal prior probabilities 1/3. Input is two dimensional. The class conditional density of X is a normal distribution. 1.0 0.0 The common covariance matrix = . 0.0 1.0 The three mean vectors are: 0 -3 -1 1 = 2 = 3 = 0 2 -3 Total of 450 samples are drawn with 150 in each class for training. Another set of 450 samples are drawn with 150 in each class for testing. http://www.stat.psu.edu/jiali Jia Li Regularized Discriminant Analysis and Reduced-Rank LDA The scatter plot of the test data. Red: class 1. Blue: class 2. Magenta: class 3. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA LDA Result Priors: 1 = 2 = 3 = 150 = 0.3333. ^ ^ ^ 450 The three mean vectors are: -0.0757 -2.8310 -0.9992 1 = ^ 2 = ^ 3 = ^ -0.0034 1.9847 -2.9005 ^ = 0.9967 0.0020 . Estimated covariance matrix: 0.0020 1.0263 Decision boundaries: Between class 1 (red) and 2 (blue): Between class 1 (red) and 3 (magenta): 5.9480 + 2.7684X1 - 1.9427X2 = 0 . 4.5912 + 0.9209X1 + 2.8211X2 = 0 . Between class 2 (blue) and 3 (magenta): -1.3568 - 1.8475X1 + 4.7639X2 = 0 . Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Classification error rate on the test data set: 7.78%. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Discriminant Coordinates Between-class covariance matrix: 1.3111 -1.3057 B= . -1.3057 4.0235 Within-class covariance matrix: 0.9967 0.0020 W= . 0.0020 1.0263 1 2 -0.0686 -1.0108 W = . 0.9960 -0.0676 3.7361 1.4603 = (W- 1 )T BW- 1 = 2 2 B . 1.4603 1.5050 http://www.stat.psu.edu/jiali Jia Li Regularized Discriminant Analysis and Reduced-Rank LDA Eigen-decomposition of B : B = V DB VT 0.8964 V = 0.4432 4.4582 DB = 0 0.4432 -0.8964 0 . 0.7830 Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA The two discriminant coordinates -0.0668 -1 2v v1 = W 1 = -0.9848 0.3831 = -0.9128 1 -0.9255 v2 = W - 2 v 2 = -0.3757 are: 0.9994 -0.0678 0.8964 0.4432 Project data onto v1 and classify using only this 1-D data. The projected data are xiT v1 , i = 1, ..., N. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Solid line: first DC. Dash line: second DC. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Projection on the First DC Projection of the training data on the first discriminant coordinate. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Perform LDA on the projected data. The classification rule is: 1 -1.4611 x T v1 1.1195 ^ (x) = 2 x T v1 -1.4611 G 3 x T v1 1.1195 Error rate on the test data: 12.67%. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Principal Component Direction Find the input matrix of X , or do singular value decomposition of mean removed X , to find the principal component directions. Denote the covariance matrix by T: 2.3062 -1.3066 T= . -1.3066 5.0542 T Eigen-decomposition of T = VT DT VT : 0.3710 -0.9286 5.5762 0 VT = DT = -0.9286 -0.3710 0 1.7842 Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Solid line: first PCD. Dash line: second PCD. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Results Based on the First PC Projection of data on the first PC. The boundaries between classes are shown. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Perform LDA on the projected data. The classification rule is: 1 -1.4592 x T v1 1.1489 ^ (x) = 2 x T v1 -1.4592 G 3 x T v1 1.1489 Error rate on the test data: 13.11%. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA Comparison It is generally true that T = B + W. For the given example W I; and the true within-class covariance matrix is I. Ideally, for this example, both the discriminant coordinates and the principal component directions are simply the eigenvectors of B. In general, discriminant coordinates and principal component directions are different. To compute PC directions, class information is not needed; and hence PCs have more flexible applications. For classification, DCs tend to be better. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA A New Simulation Change the common covariance matrix to: 4.0898 -0.8121 . -0.8121 0.5900 The scatter plot of the test data set. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA LDA Result The classification boundaries obtained by LDA. The error rate for the test data is 6%. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and Reduced-Rank LDA DCs and PC Directions The solid line indicates the first DC or PC; the dash line indicates the second DC or PC. Discriminant coordinates Jia Li http://www.stat.psu.edu/jiali Principal component directions Regularized Discriminant Analysis and Reduced-Rank LDA Projection on 1-D The LDA results obtained using the projected data onto the first discriminant coordinate and the first principal component direction. Projection on the first DC (test set error rate: 7.78%) Jia Li http://www.stat.psu.edu/jiali Projection on the first PCD (test set error rate: 32.44%) ...
View Full Document

Ask a homework question - tutors are online