dimensionality-11-22-05 - 1 Reducing Data Dimension Machine...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Reducing Data Dimension Machine Learning 10-701 November 2005 Tom M. Mitchell Carnegie Mellon University Required reading: • Bishop, chapter 3.6, 8.6 Recommended reading: • Wall et al., 2003 Outline • Feature selection – Single feature scoring criteria – Search strategies • Unsupervised dimension reduction using all features – Principle Components Analysis – Singular Value Decomposition – Independent components analysis • Supervised dimension reduction – Fisher Linear Discriminant – Hidden layers of Neural Networks Dimensionality Reduction Why? • Learning a target function from data where some features are irrelevant - reduce variance, improve accuracy • Wish to visualize high dimensional data • Sometimes have data whose “intrinsic” dimensionality is smaller than the number of features used to describe it - recover intrinsic dimension Supervised Feature Selection Supervised Feature Selection Problem: Wish to learn f: X Æ Y, where X=<X 1 , …X N > But suspect not all X i are relevant Approach: Preprocess data to select only a subset of the X i • Score each feature, or subsets of features – How? • Search for useful subset of features to represent data – How? Scoring Individual Features X i Common scoring methods: • Training or cross-validated accuracy of single-feature classifiers f i : X i Æ Y • Estimated mutual information between X i and Y : • χ 2 statistic to measure independence between X i and Y • Domain specific criteria – Text: Score “stop” words (“the”, “of”, …) as zero – fMRI: Score voxel by T-test for activation versus rest condition – … 2 Choosing Set of Features to learn F: X Æ Y Common methods: Forward1: Choose the n features with the highest scores Forward2: – Choose single highest scoring feature X k – Rescore all features, conditioned on the set of already-selected features • E.g., Score(X i | X k ) = I(X i ,Y |X k ) • E.g, Score(X i | X k ) = Accuracy(predicting Y from X i and X k ) – Repeat, calculating new scores on each iteration, conditioning on set of selected features Choosing Set of Features Common methods: Backward1: Start with all features, delete the n with lowest scores Backward2: Start with all features, score each feature conditioned on assumption that all others are included....
View Full Document

This note was uploaded on 02/28/2008 for the course CSCI 6360 taught by Professor Wu during the Spring '08 term at University of Texas at Dallas, Richardson.

Page1 / 7

dimensionality-11-22-05 - 1 Reducing Data Dimension Machine...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online