LEC6 - Missing and Empty Values Morgan C. Wang Department...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
Morgan C. Wang Department of Statistics University of Central Florida Missing and Empty Values 3/8/2010 1 Morgan C. Wang
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline y Introduction y Criterion for Replacing Missing Values y Unconditional Imputation Methods y Conditional Imputation Methods y Conclusions y Case Study 3/8/2010 Morgan C. Wang 2
Background image of page 2
Introduction 3/8/2010 3 Morgan C. Wang
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction y The presence of missing values can be problematic for data miner since significant portion of existing algorithms can only analyze complete cases. 3/8/2010 Morgan C. Wang 4
Background image of page 4
Introduction y Approaches of dealing missing values y Complete cases analysis y Use algorithm’s built-in mechanism to deal with missing values y Impute missing values before using algorithms to fit models 3/8/2010 Morgan C. Wang 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction y Complete cases analysis: Cases without any missing values are used in the analysis. When only a relative few cases have missing values, the complete-case analysis has some moderately attractive theoretical properties even when the “missingness” depends on observed values of other variables or unobserved variables (Donner, 1982; Jones, 1996). 3/8/2010 Morgan C. Wang 6
Background image of page 6
Introduction y Drawbacks of complete cases analysis y A significant proportion of the data set would be ignored, if attention were restricted to the complete cases. y The expected proportion of complete cases is (1- α ) k if each of the k variables can be missing completely at random with probability α . y Example: A 2% chance of missing values in each of 100 input variables or a 5% chance of missing values in each of 40 input variables would leave on average only 13% complete cases, if these values were indeed missing completely at random. 3/8/2010 Morgan C. Wang 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction y Drawbacks of complete cases analysis y It is not easy to score any new case having one or more missing values. This would be unacceptable, if many future cases have missing values on at least one input variable in the predicted models. 3/8/2010 Morgan C. Wang 8
Background image of page 8
Introduction y Drawbacks of complete cases analysis y “Missingness” itself of the data can provide valuable information in predictive modeling. y Example: The missing might be able to provide valuable information. For example, missing value patterns (MVPs) do contribute useful information in many predicted modeling. Complete cases analysis will not be able to utilize this valuable information. 3/8/2010 Morgan C. Wang 9
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction y Use algorithm’s built-in mechanism to deal with missing values y Example: CART (Classification and Regression Tree) can deal with missing values through surrogate splits, and the decision tree algorithm in Enterprise Miner treats the missing values as a floating category. Missing value imputation is not necessary, if the miners want to use either CART or decision tree in Enterprise Miner 3/8/2010 Morgan C. Wang 10
Background image of page 10
Introduction y Impute missing values before using algorithms to fit models y For data mining algorithms that do not have built-in imputation techniques, utilize general
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 09/22/2011 for the course STA 6714 taught by Professor Staff during the Spring '11 term at University of Central Florida.

Page1 / 87

LEC6 - Missing and Empty Values Morgan C. Wang Department...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online