# lecture5 - Data Mining CS57300 Purdue University September...

This preview shows pages 1–8. Sign up to view the full content.

Data Mining CS57300 Purdue University September 9, 2010

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Data exploration and visualization
Visualization • Human eye/brain have evolved powerful methods to detect structure in nature • Display data in ways that exploit human pattern recognition abilities • Limitation: Can be difFcult to apply if data size (number of dimensions or instances) is large

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Exploratory data analysis • Data analysis approach that employs a number of (mostly graphical) techniques to: • Maximize insight into data • Uncover underlying structure • Identify important variables • Detect outliers and anomalies • Test underlying modeling assumptions • Develop parsimonious models • Generate hypotheses from data
Techniques • Low-dimensional data • Summarizing data with simple statistics • Plotting raw data (1D, 2D, 3D) • Higher-dimensional data • Principal component analysis • Multidimensional scaling

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
• Measures of location • Mean: • Median: value with 50% of points above and below • Quartile: value with 25% (75%) points above and below • Mode: most common value Data summarization ˆ μ = 1 n n i =1 x ( i )
• Measures of dispersion or variability • Variance: • Standard deviation: • Range: difference between max and min point • Interquartile range: difference between 1 st and 3 rd Q • Skew: Data summarization ˆ σ 2 k = 1 n n i =1 ( x ( i ) - μ ) 2 ˆ σ k = ± 1 n n i =1 ( x ( i ) - μ ) 2 P n i =1 ( x ( i ) - ˆ μ ) 3 ( P n i =1 ( x ( i ) - ˆ μ ) 2 ) 3 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 03/13/2012 for the course CS 573 taught by Professor Staff during the Fall '08 term at Purdue University-West Lafayette.

### Page1 / 36

lecture5 - Data Mining CS57300 Purdue University September...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online