IDS572_Class2 Sept1_2011

# IDS572_Class2 Sept1_2011 - Basics of Statistics for Data...

This preview shows pages 1–8. Sign up to view the full content.

1 Basics of Statistics for Data Mining IDS 572 September 1, 2011

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Statistics vs Data Mining For statisticians, data mining has a negative connotation – one of searching for data to support preconceived ideas Statistics developed as a discipline to help scientists make sense of observations and experiments, hence the scientific method Problem has often been too little data for statisticians DM is faced with too much data Many of the techniques & algorithms used are shared by both statisticians and data miners
3 Some Definitions Population (universe) is the collection of things under consideration Sample is a portion of the population selected for analysis representative vs random samples sampling bias Statistic is a summary measure computed to describe a characteristic of

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 Some More Definitions Mean (average) is the sum of the values divided by the number of values Median is the midpoint of the values (50% above; 50% below) after they have been ordered from the smallest to the largest Mode is the value that appears most frequently
5 Variance and Standard Deviation Variance is a measure of the dispersion of a sample (or how closely the observations cluster around the mean [average]) Sum of the squared differences from the mean for the data under consideration Standard Deviation , the square root of the variance, is the measure of variation in the observed values (or variation in the clustering around the mean)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 Population and Sample Population Sample Use parameters to  summarize features Use statistics to  summarize features Inference on the population from the sample
Data Types Nominal state codes, SKU Ordinal satisfaction ratings, rankings

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 50

IDS572_Class2 Sept1_2011 - Basics of Statistics for Data...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online