# Lecture3 - Outliers Morgan C. Wang Department of Statistics...

This preview shows pages 1–17. Sign up to view the full content.

Morgan C. Wang Department of Statistics University of Central Florida Outliers 2/9/2011 1 Morgan C. Wang

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Outline Introduction Data Anomaly Univariate Outliers Detection Multivariate Outliers Detection Case Study Conclusions 2/9/2011 Morgan C. Wang 2
Introduction 2/9/2011 3 Morgan C. Wang

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Introduction Data Anomaly: Records that contain variable (or variables) which is significant different the nominal data pattern. Examples: Unusual large values Unusual small values Missing data Records that violate the nominal relationship between specific variables 2/9/2011 Morgan C. Wang 4
Introduction Consequences of Data Anomaly: Analytical results might be significantly influenced by the presence of even a small portion of anomalous records. Examples: Regression coefficient shift by small amount of influential points Variance inflation by a small portion of many extreme large or small records Pearson correlation shift by very few pairs of points 2/9/2011 Morgan C. Wang 5

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Influential Points in Regression 2/9/2011 Morgan C. Wang 6
Outlier on Pearson Correlation 2/9/2011 Morgan C. Wang 7

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Existing Methods Statistical Detection Methods: Univariate Outlier Detection: (1) Three Standard Deviation Rule; (2) Hampel Identifier; (3) Standard Box-Plot Outlier Detection Rule; and (4) SmartSifter; Multivariate Outlier Detection: (1) Visual Based Detection; (2) Model Based Detection such as Regression Analysis; (3) Deletion Based such as SmartSifter 2/9/2011 Morgan C. Wang 8
Existing Methods Distance Based Methods: Multivariate Outlier Detection: (1) DB(p,D): (*An object is declared as an outlier if pth fraction of the data are at least D distance from this object *); (2) The K- nearest Neighbors Method; (3) Local Distance Based Method; Kernel Function Detection Method Fuzzy Approach with Kernel Functions 2/9/2011 Morgan C. Wang 9

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Data Anomaly 2/9/2011 10 Morgan C. Wang
Data Collection System 2/9/2011 Morgan C. Wang 11 Data Source Idea Condition Real Condition Data Collection System Idea D I Real D R Data Base

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Definition D I : Idea data source (error free) D R : Realistic data source ξ: Error threshold ρ: Distance Measure 2/9/2011 Morgan C. Wang 12
Anomalous Types Normal Observational Error: Significant Observational Error: 2/9/2011 Morgan C. Wang 13   I X R x x D and R D with x, R     I X R x x D and R D with x, R

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Anomalous Types Simple unobserved data: Coded unobserved data: Disguised unobserved data: 2/9/2011 Morgan C. Wang 14 I X R x D but R D  ** I X R x D and R m D with m be a fixed value I X R x D and R y D with random value y.  
Univariate Outliers Detection 2/9/2011 15 Morgan C. Wang

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Why do we want to detect univariate outliers?
This is the end of the preview. Sign up to access the rest of the document.

## Lecture3 - Outliers Morgan C. Wang Department of Statistics...

This preview shows document pages 1 - 17. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online