12 Dim_Reduct

12 Dim_Reduct - EE 649 Pattern Recognition Dimensionality...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
EE 649 Pattern Recognition Dimensionality Reduction Ulisses Braga-Neto ECE Department EE 649 Pattern Recognition – p.1/41
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Main Idea In many instances, it is often necessary to reduce the number of measurements (features) in a data set to make the problem tractable. Examples include: Digital signal/image data with thousands/millions of samples/pixels. Historical time-series data accumulated over years (e.g. weather, stock market, etc.) “Omic” (genomic, proteomic, immunomic, etc.) data with tens of thousands of molecular measurements (RNA, protein, antibody, etc.) EE 649 Pattern Recognition – p.2/41
Background image of page 2
Why Reduce Dimensionality? Recall that the Bayes error can never increase as the number of features increases. So why is dimensionality reduction necessary or even wanted? To improve classification performance. The peaking phenomenon (“curse of dimensionality”) implies that the true classification error can and often does increase with more features. To reduce computational load, in terms of both execution time and data storage. To perform preliminary exploratory data analysis (visualization of high-dimensional data) EE 649 Pattern Recognition – p.3/41
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Some Heuristics Dimensionality reduction will generally involve loss of information. One typically wants to reduce the number of features in such a way that this loss is minimized. Some heuristics for this are: Features that are functions of other features should be discarded. Features that are nearly constant (small-variance) should be discarded. Features strongly correlated with Y should be retained. Features weakly correlated with Y (i.e., “noisy features”) should be discarded. EE 649 Pattern Recognition – p.4/41
Background image of page 4
Class-Separability Criteria Given the original feature vector X = ( X 1 ,... ,X p ) R p , dimensionality reduction finds a transformation T : R p R d , where d < p , such that the new feature vector is X = T ( X ) = ( X 1 ,... ,X d ) R d . To minimize the loss of information, X = T ( X ) should be selected such that a class-separability criterion J ( X ,Y ) is maximized. For example The Bayes error: J ( X ,Y ) = 1 ǫ ( X ,Y ) = 1 E [min { η ( X ) , 1 η ( X ) } ] The designed classification error: J Ψ n ( X ,Y ) = 1 ǫ n ( X ,Y ) = 1 E [ | Y Ψ n ( X ; S n ) | ] Note: In practice, estimates of these errors have to be used. EE 649 Pattern Recognition – p.5/41
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Additional class-separability criteria: F-errors (e.g. asymptotic nearest-neighbor error ǫ NN , Matsushita error ρ , expected conditional entropy E ): J ( X ,Y ) = 1 d F ( X ,Y ) = 1 E [ F ( η ( X ))] Mahalanobis distance: J ( X ,Y ) = r ( μ 1 μ 0 ) T Σ 1 ( μ 1 μ 0 ) Scatter-Matrices: (e.g., Fisher’s discriminant) J ( X ,Y ) = w t S B w w t S W w where X = T ( X ) = w T X . EE 649 Pattern Recognition – p.6/41
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 41

12 Dim_Reduct - EE 649 Pattern Recognition Dimensionality...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online