SPR_LectureHandouts_Chapter_03_Part4_FeatureSelection_SLDA

SPR_LectureHandouts_Chapter_03_Part4_FeatureSelection_SLDA...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Pattern Recognition ECE-8443 Chapter 3, Part 4 Feature Selection and Stepwise Linear Discriminant Analysis Electrical and Computer Engineering Department, Mississippi State University. 1 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Feature (Subset) Selection • Sometimes, simple feature selection methods are employed to reduce the dimensionality of the feature space • The key idea behind the algorithm is the following: • Sort all features in decreasing order of some performance metric that you wish to maximize • Select the top d’ features (d’<d) • Clearly, this is not an exhaustive search of all possible combinations of features, and is hence expected to be sub-optimal • Other approaches include forward selection and backward rejection to prune the “less” important features out 2 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Feature (Subset) Selection • What are some possible candidates for the “metric”? • Entropy • Bhattacharya Distance • Jeffries-Matsushita Distance • Area under the Receiver-Operating-Characteristic curve (will study this later) • Etc…. Many other possibilities, including combinations of two or more of the quantities listed above 3 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Feature (Subset) Selection Bhattacharya Distance ∑1 + ∑ 2 | 1 1 T ∑1 + ∑ 2 2 BD = ( µ 2 − µ1 ) ( µ 2 − µ1 ) + ln 8 2 2 | ∑1 || ∑ 2 | −1 | Entropy: H ( p( x)) = - ∫ p(x)ln(p(x) )dx Jeffries-Matsushita Distance JM = 2(1 − e − BD ) 4 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Stepwise LDA • Sometimes, it may be hard/impossible to perform LDA (computationally) on a high dimensional feature space • E.g., if the starting dimension is very high and we don’t have enough training data • The within-class scatter matrix may be singular, inverse would be unstable etc. • One work-around is to regularize the formulation (also known as Regularized LDA): (SW + ηI ) -1 S B w = λw • Another approach is Stepwise LDA (S-LDA) • Prune away “some” features using feature selection, then apply LDA on this reduced dimensional space (Two-step dimensionality reduction) 5 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Stepwise LDA – Step 1: Forward Selection Start Sorted BDs using Training Data Take the first with highest BD as the feature set Add the next best feature to the feature set and calculate the combination BD2, n = n+1 Metric: Bhattacharya Distance (BD) Lenfea: User set upper bound of the pruned subspace No If BD1 > BD2 Include the feature to the selected feature space and assign BD1 = BD2 Yes Remove the feature from the selected feature space and assign n = n-1 If n = LenFea or end of feature space No Yes End 6 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Stepwise LDA – Step 2: Backward Rejection Start Selected features, Training data, Lenfea, Yes If n < 1 Stop No Metric: Bhattacharya Distance (BD) Lenfea: User set upper bound of the pruned subspace Calculate the combination BD1 with the selected Features F1 Remove the first feature from F1 and calculate BD2 with the remaining feature set F2 Yes If BD1 < BD2 Selected Features F1 = F2, BD1 = BD2 No End Restore the removed feature to F1 Remove the next feature from F1 and calculate BD2 with the remaining feature set F2 End 7 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department Stepwise LDA – Step 3: LDA • Apply LDA on the pruned feature space (features remaining after the forward-selection, backward rejection process) • Such a pruning before employing LDA ensures that LDA is employed on the most useful features (well, almost – stepwise selection and rejection is still not as good as an exhaustive greedy search) • Lenfea imposes an upper-bound on the size of the pruned feature subspace being created, so that it does not grow very large relative to the amount of training data available 8 Chapter 3 Saurabh Prasad Pattern Recognition Electrical and Computer Engineering Department ...
View Full Document

This note was uploaded on 02/20/2012 for the course ECE 8443 taught by Professor Staff during the Spring '10 term at University of Houston.

Ask a homework question - tutors are online