preprocess - DataPreprocessing...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, 
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
04/08/10 CSE 572: Data Mining  by H. Liu 2 Data preprocessing A necessary step for serious, effective,  real-world data mining It’s often omitted in “academic” DM, but  can’t be over-stressed in practical DM The need for pre-processing in DM Data reduction - too much data Data cleaning – extant noise Data integration and transformation
Background image of page 2
04/08/10 CSE 572: Data Mining  by H. Liu 3 Data reduction Data cube aggregation Feature selection and dimensionality reduction Sampling random sampling and others Instance selection (search based) Data compression PCA, Wavelet transformation Data discretization
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
04/08/10 CSE 572: Data Mining  by H. Liu 4 Feature selection The basic problem Finding a  subset  of original features that can  learn the domain better or equally better What are the advantages of doing so? Curse of dimensionality From 1-d, 2-d, to 3-d: an illustration Another example – the wonders of reducing  the number of features since # of instances  available to learning is dependent on # of  features
Background image of page 4
04/08/10 CSE 572: Data Mining  by H. Liu 5 Illustration of the difficulty of the problem Search space (an example with 4 features)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
04/08/10 CSE 572: Data Mining  by H. Liu 6 Reduce the chance of data overfitting Examples From 2-D to 3-D Are the features selected really good? If they are, they  may help mitigate the overfitting How do we know? Experiments
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/08/2010 for the course CS 420 taught by Professor Dawsonengler during the Spring '02 term at San Jose State.

Page1 / 16

preprocess - DataPreprocessing...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online