Lecture3 - Outliers Morgan C. Wang Department of Statistics...

Info iconThis preview shows pages 1–17. Sign up to view the full content.

View Full Document Right Arrow Icon
Morgan C. Wang Department of Statistics University of Central Florida Outliers 2/9/2011 1 Morgan C. Wang
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline Introduction Data Anomaly Univariate Outliers Detection Multivariate Outliers Detection Case Study Conclusions 2/9/2011 Morgan C. Wang 2
Background image of page 2
Introduction 2/9/2011 3 Morgan C. Wang
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction Data Anomaly: Records that contain variable (or variables) which is significant different the nominal data pattern. Examples: Unusual large values Unusual small values Missing data Records that violate the nominal relationship between specific variables 2/9/2011 Morgan C. Wang 4
Background image of page 4
Introduction Consequences of Data Anomaly: Analytical results might be significantly influenced by the presence of even a small portion of anomalous records. Examples: Regression coefficient shift by small amount of influential points Variance inflation by a small portion of many extreme large or small records Pearson correlation shift by very few pairs of points 2/9/2011 Morgan C. Wang 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Influential Points in Regression 2/9/2011 Morgan C. Wang 6
Background image of page 6
Outlier on Pearson Correlation 2/9/2011 Morgan C. Wang 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Existing Methods Statistical Detection Methods: Univariate Outlier Detection: (1) Three Standard Deviation Rule; (2) Hampel Identifier; (3) Standard Box-Plot Outlier Detection Rule; and (4) SmartSifter; Multivariate Outlier Detection: (1) Visual Based Detection; (2) Model Based Detection such as Regression Analysis; (3) Deletion Based such as SmartSifter 2/9/2011 Morgan C. Wang 8
Background image of page 8
Existing Methods Distance Based Methods: Multivariate Outlier Detection: (1) DB(p,D): (*An object is declared as an outlier if pth fraction of the data are at least D distance from this object *); (2) The K- nearest Neighbors Method; (3) Local Distance Based Method; Kernel Function Detection Method Fuzzy Approach with Kernel Functions 2/9/2011 Morgan C. Wang 9
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Data Anomaly 2/9/2011 10 Morgan C. Wang
Background image of page 10
Data Collection System 2/9/2011 Morgan C. Wang 11 Data Source Idea Condition Real Condition Data Collection System Idea D I Real D R Data Base
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Definition D I : Idea data source (error free) D R : Realistic data source ξ: Error threshold ρ: Distance Measure 2/9/2011 Morgan C. Wang 12
Background image of page 12
Anomalous Types Normal Observational Error: Significant Observational Error: 2/9/2011 Morgan C. Wang 13   I X R x x D and R D with x, R     I X R x x D and R D with x, R
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Anomalous Types Simple unobserved data: Coded unobserved data: Disguised unobserved data: 2/9/2011 Morgan C. Wang 14 I X R x D but R D  ** I X R x D and R m D with m be a fixed value I X R x D and R y D with random value y.  
Background image of page 14
Univariate Outliers Detection 2/9/2011 15 Morgan C. Wang
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Why do we want to detect univariate outliers?
Background image of page 16
Image of page 17
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 69

Lecture3 - Outliers Morgan C. Wang Department of Statistics...

This preview shows document pages 1 - 17. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online