This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Regularized Discriminant Analysis and ReducedRank LDA Regularized Discriminant Analysis and ReducedRank LDA
Jia Li
Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu http://www.stat.psu.edu/jiali Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Regularized Discriminant Analysis A compromise between LDA and QDA. Shrink the separate covariances of QDA toward a common covariance as in LDA. Regularized covariance matrices: ^ ^ ^ k () = k + (1  ) . The quadratic discriminant function k (x) is defined using the ^ shrunken covariance matrices k (). The parameter controls the complexity of the model. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Computations for LDA Discriminant function: 1 1 ^ ^ k (x) =  log k   (x  k )T 1 (x  k ) + log k . k 2 2 ^ ^ Eigendecomposition of k : k = Uk Dk UT . Dk is diagonal k with elements dkl , l = 1, 2, ..., p. Uk is p p orthonormal. ^ (x  k )T 1 (x  k ) ^ ^ k
1 = [UT (x  k )]T D1 [UT (x  k )] k k k Jia Li ^ log k  = http://www.stat.psu.edu/jiali = [Dk 2 UT (x  k )]T [Dk 2 UT (x  k )] k k
l 1 log dkl . Regularized Discriminant Analysis and ReducedRank LDA ^ LDA, = UDUT : Sphere the data D 2 UT X X and D 2 UT k . k For the transformed data and class centroids, classify x to the closest class centroid in the transformed space, modulo the effect of the class prior probabilities k . 1 1 Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA The geometric illustration of LDA. Left: Original data in the two classes. The ellipsis represent the two estimated covariance matrices. Right: The class mean removed data and the estimated common covariance matrix.
Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA The geometric illustration of LDA. Left: The sphered mean removed data. Right: The sphered data in the two classes, the sphered means, and the decision boundary.
Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA ReducedRank LDA
Binary classification Decision boundary is given by the following linear equation: log 1 1  (1 + 2 )T 1 (1  2 ) 2 2 +x T 1 (1  2 ) = 0 . Only the projection of X on the direction 1 (1  2 ) matters. If the data are sphered, only the projection of X on  is needed. 1 2
http://www.stat.psu.edu/jiali Jia Li Regularized Discriminant Analysis and ReducedRank LDA Suppose data are sphered. The subspace spanned by the K centroids is of rank K  1, denoted by HK 1 . Data can be viewed in HK 1 without losing any information. When K > 3, we might want to find a subspace HL HK 1 optimal for LDA in some sense. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Optimization Criterion Fisher's optimization criterion: the projected centroids were spread out as much as possible comparing with variance. Find the linear combination Z = aT X such that the betweenclass variance is maximized relative to the withinclass variance, where a = (a1 , a2 , ..., ap )T . Assume the withinclass covariance matrix of X is W, i.e., the common covariance matrix of the classes. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA The betweenclass covariance matrix is B. Suppose k is a column vector denoting the mean vector of class k. =
K k=1 K k=1 k k k (k  )(k  )T B = Note k is the percentage of class k samples in the entire data set. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA For the linear combination Z , the betweenclass variance is aT Ba and the withinclass variance is aT Wa. Fisher's optimization becomes max
a aT Ba . aT Wa
1 T Eigendecomposition of W = VW DW VW . T 2 W = (W 2 )T W 2 , where W 2 = DW VW .
1 1 1 Define b = W 2 a, then a = W 2 b. The optimization becomes b T (W 2 )T BW 2 b max b bT b
1 1 1 1 Jia Li Define B = (W 2 )T BW 2 .
http://www.stat.psu.edu/jiali 1 1 Regularized Discriminant Analysis and ReducedRank LDA Eigendecomposition of B = V DB VT . V = (v1 , v2 , ..., vp ). The maximization is achieved by b = v1 , the first eigen vector . of B Similarly, one can find the next direction b2 = v2 that is and maximizes b T B b /b T b . orthogonal to b1 = v1 2 2 2 2 Since a = W 2 b, convert to the original problem, al = W 2 vl .
1 1 The al (also denoted as vl in the textbook) are referred to as discriminant coordinates or canonical variates. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Summarization on obtaining discriminant coordinates: Find the centroids for all the classes. Find betweenclass covariance matrix B using the centroid vectors. ^ Find withinclass covariance matrix W, i.e., . By eigendecomposition
T T 2 2 W = (W 2 )T W 2 = (DW VW )T DW VW .
1 1 1 1 Compute
T B = (W 2 )T BW 2 = DW 2 VW BVW DW 2 .
1 1 1 1 Eigendecomposition of B : B = V DB VT . The discriminant coordinates are: al = W 2 vl . 1 Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Simulation Three classes with equal prior probabilities 1/3. Input is two dimensional. The class conditional density of X is a normal distribution. 1.0 0.0 The common covariance matrix = . 0.0 1.0 The three mean vectors are: 0 3 1 1 = 2 = 3 = 0 2 3 Total of 450 samples are drawn with 150 in each class for training. Another set of 450 samples are drawn with 150 in each class for testing.
http://www.stat.psu.edu/jiali Jia Li Regularized Discriminant Analysis and ReducedRank LDA The scatter plot of the test data. Red: class 1. Blue: class 2. Magenta: class 3. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA LDA Result Priors: 1 = 2 = 3 = 150 = 0.3333. ^ ^ ^ 450 The three mean vectors are: 0.0757 2.8310 0.9992 1 = ^ 2 = ^ 3 = ^ 0.0034 1.9847 2.9005 ^ = 0.9967 0.0020 . Estimated covariance matrix: 0.0020 1.0263 Decision boundaries: Between class 1 (red) and 2 (blue): Between class 1 (red) and 3 (magenta): 5.9480 + 2.7684X1  1.9427X2 = 0 . 4.5912 + 0.9209X1 + 2.8211X2 = 0 . Between class 2 (blue) and 3 (magenta): 1.3568  1.8475X1 + 4.7639X2 = 0 . Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Classification error rate on the test data set: 7.78%.
Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Discriminant Coordinates Betweenclass covariance matrix: 1.3111 1.3057 B= . 1.3057 4.0235 Withinclass covariance matrix: 0.9967 0.0020 W= . 0.0020 1.0263
1 2 0.0686 1.0108 W = . 0.9960 0.0676 3.7361 1.4603 = (W 1 )T BW 1 = 2 2 B . 1.4603 1.5050
http://www.stat.psu.edu/jiali Jia Li Regularized Discriminant Analysis and ReducedRank LDA Eigendecomposition of B : B = V DB VT 0.8964 V = 0.4432 4.4582 DB = 0 0.4432 0.8964 0 . 0.7830 Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA The two discriminant coordinates 0.0668 1 2v v1 = W 1 = 0.9848 0.3831 = 0.9128 1 0.9255 v2 = W  2 v 2 = 0.3757 are: 0.9994 0.0678 0.8964 0.4432 Project data onto v1 and classify using only this 1D data. The projected data are xiT v1 , i = 1, ..., N. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Solid line: first DC. Dash line: second DC.
Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Projection on the First DC
Projection of the training data on the first discriminant coordinate. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Perform LDA on the projected data. The classification rule is: 1 1.4611 x T v1 1.1195 ^ (x) = 2 x T v1 1.4611 G 3 x T v1 1.1195 Error rate on the test data: 12.67%. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Principal Component Direction Find the input matrix of X , or do singular value decomposition of mean removed X , to find the principal component directions. Denote the covariance matrix by T: 2.3062 1.3066 T= . 1.3066 5.0542
T Eigendecomposition of T = VT DT VT : 0.3710 0.9286 5.5762 0 VT = DT = 0.9286 0.3710 0 1.7842 Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Solid line: first PCD. Dash line: second PCD.
Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Results Based on the First PC
Projection of data on the first PC. The boundaries between classes are shown. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Perform LDA on the projected data. The classification rule is: 1 1.4592 x T v1 1.1489 ^ (x) = 2 x T v1 1.4592 G 3 x T v1 1.1489 Error rate on the test data: 13.11%. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA Comparison It is generally true that T = B + W. For the given example W I; and the true withinclass covariance matrix is I. Ideally, for this example, both the discriminant coordinates and the principal component directions are simply the eigenvectors of B. In general, discriminant coordinates and principal component directions are different. To compute PC directions, class information is not needed; and hence PCs have more flexible applications. For classification, DCs tend to be better. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA A New Simulation Change the common covariance matrix to: 4.0898 0.8121 . 0.8121 0.5900 The scatter plot of the test data set. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA LDA Result
The classification boundaries obtained by LDA. The error rate for the test data is 6%. Jia Li http://www.stat.psu.edu/jiali Regularized Discriminant Analysis and ReducedRank LDA DCs and PC Directions
The solid line indicates the first DC or PC; the dash line indicates the second DC or PC. Discriminant coordinates
Jia Li http://www.stat.psu.edu/jiali Principal component directions Regularized Discriminant Analysis and ReducedRank LDA Projection on 1D
The LDA results obtained using the projected data onto the first discriminant coordinate and the first principal component direction. Projection on the first DC (test set error rate: 7.78%)
Jia Li http://www.stat.psu.edu/jiali Projection on the first PCD (test set error rate: 32.44%) ...
View
Full
Document
 Fall '09
 JIALI
 Statistics

Click to edit the document details