DataPreprocessing

DataPreprocessing - Data Preprocessing Data cleaning Fill...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Preprocessing Data cleaning Fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies Data integration Integration of multiple databases, data cubes, or files Data transformation Normalization and aggregation Data reduction Obtains reduced representation in volume but produces the same or similar analytical results Data discretization Part of data reduction but with particular importance, especially for numerical data Data Mining: Concepts and Techniques
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Missing Completely at Random (MCAR): Missingness (the probability that an observation (X i ) is missing) is unrelated to the value of X i or other variables Missing at Random (MAR): Missingness is not related to X i , but related to other variables. Missing not at random (Nonignorable) Missingness is correlated with X i , the variable itself Data Cleaning: Missing Data
Background image of page 2
Missing Completely at Random (MCAR): Missingness (the probability that an observation (X i ) is missing) is unrelated to the value of X i or other variables Cases with complete data are indistinguishable from cases with incomplete data Example: Typos, equipment malfunctioning, etc. Data Cleaning: Missing Data
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Missing at Random (MAR): Missingness is not related to X i but related to other variables. Cases with incomplete data differ from complete ones. But, the reason is not the actual variable(s) where data are missing. The pattern of missingness is predictable from other variables. Example: Research participants with low-esteem do not follow up sessions Measure self-esteem in initial sessions Predict the missingness pattern Data Cleaning: Missing Data
Background image of page 4
Missing not at random (Nonignorable) Missingness is correlated with X i , the variable itself Examples: Low-people with low incomes less likely to report their income than people with higher incomes A participant in a weight loss study does not attend a
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 25

DataPreprocessing - Data Preprocessing Data cleaning Fill...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online