This preview shows pages 1–13. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Dimension Reduction Dimension Reduction PR , ANN, & ML 2 Dimension Reduction x Curse of dimensionality b with 50 features (dimensions), each quantized to 20 levels, create 20 50 possible feature combinations, imagine how many samples you need to estimate p( x w)? b how do you visualize the structure in a 50 dimensional space? PR , ANN, & ML 3 Other problems x Size of the local regions needed for density estimation getting larger and larger b To capture r% of the data, edge length is r 1/n h n=10, r=0.01, x =0.63, h n=10, r=0.1, x=0.8 x Data tend to boundary, creating boundary skew b Consider uniform distribution, p % interior b Exterior probability is 1p n h n=10, p=0.8, 0.89 exterior h N=100, p=0.8, 0.999. . exterior PR , ANN, & ML 4 Solutions  Reduction x Fishers linear discriminant b Preserve class separation (special case of principle component analysis) x Multidimensional scaling b Preserve distance measures x Principal component analysis b Best data representation (not necessarily best class separation ) PR , ANN, & ML 5 Fishers linear discriminant (2class) x Given n ddimensional samples x a linear transform which b maps dD samples onto a line b best preserves class separation x Intuitively, good features are those with large separation of means relative to variances } ,..., , { 2 1 n x x x X = n n n n n = + = = 2 1 2 2 2 2 1 1 1 1   ,   , X X X X x w t = y PR , ANN, & ML 6 x 2 x 1 x 2 x 1 ( , ) w w 1 2 ( , ) x x 1 2 y PR , ANN, & ML 7 Caveats x The nature of the problem is that ambiguity might arise when you reduce problem dimension (a good reduction algorithm may minimize the problem, but may not completely eliminate the problem) PR , ANN, & ML 8 Caveats (cont) x The figures also suggest that, sometimes, to get better performance, it is necessary to increase the dimension (more features), not to decrease it PR , ANN, & ML 9 In the original ddimensional space x Between class scatter x Within class scatter x Ideally, function should be large = + i i t i i s s s X x m x m x ) ( ) ( 2 2 2 2 1 2 2 2 1 2 2 1   s s +m m =i i i n X x x m m m 1   2 2 1 PR , ANN, & ML 10 In the transformed 1dimensional space x Between class scatter x Within class scatter x Ideally, function should be large = = =i x i i i i n y n m m m m w x w t t 1 1   2 2 1 $ $ $ ( $ ) s s s y m i i 1 2 2 2 2 2 + = 2 2 2 1 2 2 1   ) ( s s m m F += w PR , ANN, & ML 11 x Or w S w w ) m )(m m (m w m w m w t t t t t b m m ===2 1 2 1 2 2 1 2 2 1 ) (   w S w )w S (S w w S w w ) m (x ) m (x w m w x w t t t t X x t X x t t w i i i i i i s s m y s i i = + = + ==== 2 1 2 2 2 1 2 2 2 ) ( ) ( w S w w S w w t B t = += 2 2 2 1 2 2 1   ) ( s s m m w F PR , ANN, & ML 12 The Analysis x F(w): generalized Rayleigh quotient x To maximize F(w), w is the generalized eigenvector associated with the largest generalized eigenvalue ) m (m S w w w S S w S w S 2 1 1 w B 1 w B= = = or w w w w S w t B t PR , ANN, & ML...
View
Full
Document
 Spring '07
 WANG
 Machine Learning

Click to edit the document details