lecture2

# lecture2 - Data Mining CS57300 Purdue University Statistics...

This preview shows pages 1–13. Sign up to view the full content.

Data Mining CS57300 Purdue University August 31, 2010

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Statistics Background
• Necessary component of almost all data analysis • Approaches to modeling uncertainty: • Fuzzy logic • Possibility theory • Rough sets • Probability (focus in this course) Modeling uncertainty

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Probability • Probability theory (some disagreement) • Concerned with interpretation of probability • 17th century: Pascal and Fermat develop probability theory to analyze games of chance • Probability calculus (universal agreement) • Concerned with manipulation of mathematical representations • 1933: Kolmogorov states axioms of modern probability
Probability theory • Meaning of probability is focus of debate and controversy • Two main views: Frequentist and Bayesian

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Frequentist view • Dominant perspective for last century • Probability is an objective concept • DeFned as the frequency of an event occurring under repeated trials in “same” situation • E.g., number of heads in repeated coin tosses • Restricts application of probability to repeatable events
Bayesian view • Increasing importance over last decade • Due to increase in computational power that facilitates previously intractable calculations • Probability is a subjective concept • DeFned as individual degree-of-belief that event will occur • E.g., belief that we will have another snow storm tomorrow • Begin with prior belief estimates and update those by conditioning on observed data

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Bayesian vs. frequentist • Bayesian central tenet: • Explicitly model all forms of uncertainty • E.g., Parameters, model structure, predictions • Frequentist often model same uncertainty but in less-principled manner, e.g.,: • Parameters set by cross-validation • Model structure averaged in ensembles • Smoothing of predicted probabilities • Although interpretation of probability is different, underlying calculus is the same
Random variables • Mapping from a property of objects to a variable that can take one of a set of possible values X refers to random variable x refers to a value of that random variable • Discrete RV has a Fnite set of possible values • Continuous RV can take any value within an interval

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
• Probability distribution (or probability mass function or probability density function) • Discrete • Denotes probability that X will take on a particular value • Continuous • Probability of any particular point is 0, have to consider probability within an interval RVs (cont’) P ( X = x ) P ( a < X < b )= ± b a p ( x ) dx
Expectation • Discrete • Continuous • Variance • Linearity of expectation E [ X ]= ± x x · p ( x ) E [ X x x · p ( x ) dx V ar ( X )= E [ X 2 ] - ( E [ X ]) 2 E [ X + Y E [ X ]+ E [ Y ]

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Common distributions • Bernoulli • Binomial • Multinomial • Poisson • Normal
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 03/13/2012 for the course CS 573 taught by Professor Staff during the Fall '08 term at Purdue.

### Page1 / 41

lecture2 - Data Mining CS57300 Purdue University Statistics...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online