IDS572_Class2 Sept1_2011

IDS572_Class2 Sept1_2011 - Basics of Statistics for Data...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Basics of Statistics for Data Mining IDS 572 September 1, 2011
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 Statistics vs Data Mining For statisticians, data mining has a negative connotation – one of searching for data to support preconceived ideas Statistics developed as a discipline to help scientists make sense of observations and experiments, hence the scientific method Problem has often been too little data for statisticians DM is faced with too much data Many of the techniques & algorithms used are shared by both statisticians and data miners
Background image of page 2
3 Some Definitions Population (universe) is the collection of things under consideration Sample is a portion of the population selected for analysis representative vs random samples sampling bias Statistic is a summary measure computed to describe a characteristic of
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
4 Some More Definitions Mean (average) is the sum of the values divided by the number of values Median is the midpoint of the values (50% above; 50% below) after they have been ordered from the smallest to the largest Mode is the value that appears most frequently
Background image of page 4
5 Variance and Standard Deviation Variance is a measure of the dispersion of a sample (or how closely the observations cluster around the mean [average]) Sum of the squared differences from the mean for the data under consideration Standard Deviation , the square root of the variance, is the measure of variation in the observed values (or variation in the clustering around the mean)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
6 Population and Sample Population Sample Use parameters to  summarize features Use statistics to  summarize features Inference on the population from the sample
Background image of page 6
Data Types Nominal state codes, SKU Ordinal satisfaction ratings, rankings
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 50

IDS572_Class2 Sept1_2011 - Basics of Statistics for Data...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online