class_09_03

# class_09_03 - Statistical Data Mining ORIE 474 Fall 2007...

This preview shows pages 1–7. Sign up to view the full content.

Statistical Data Mining ORIE 474 Fall 2007 Tatiyana V. Apanasovich 09/03/07 Visualizing Data

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
3. Visualizing and Exploring Data Summarizing Data Tools for Displaying Single variables Relationships between two variables More than two variables Principal Component Analysis
3.2 Summarizing Data Suppose that x(1),…,x(n) is a set of n data values Relevant sample statistics are: Location measures: Mean Median and Quartiles Mode Dispersion or variability measures: Standard deviation and variance Interquartile range and range Skewness

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
3.2 Summarizing Data (cont’d) Ex: 100 data points sampled from a normal distribution with
Sample Mean Sample Mean: Ex: (whereas µ=0) Location measure Sample mean is the value that is “central” in the sense that it minimizes the sum of squared differences between it and the data: Proof: = i i x n ) ( 1 ˆ μ 36 . 1 ˆ = ( 29 ( 29 = = - = - = - i i i i i x n a na i x a i x da a i x d ) ( 1 0 ) ( ) ( 2 ) ( 2 ( 29 2 ) ( min ˆ - = i a a i x

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Sample Median and Mode Sample median = value that has an equal number of data points above or below it If, as in our example, n is even, it is usually defined as the halfway between the 2 middle values Ex: (whereas m=0) Location measure
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 23

class_09_03 - Statistical Data Mining ORIE 474 Fall 2007...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online