lecture2

lecture2 - Data Mining CS57300 Purdue University August 31,...

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Data Mining CS57300 Purdue University August 31, 2010
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Statistics Background
Background image of page 2
• Necessary component of almost all data analysis • Approaches to modeling uncertainty: • Fuzzy logic • Possibility theory • Rough sets • Probability (focus in this course) Modeling uncertainty
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Probability • Probability theory (some disagreement) • Concerned with interpretation of probability • 17th century: Pascal and Fermat develop probability theory to analyze games of chance • Probability calculus (universal agreement) • Concerned with manipulation of mathematical representations • 1933: Kolmogorov states axioms of modern probability
Background image of page 4
Probability theory • Meaning of probability is focus of debate and controversy • Two main views: Frequentist and Bayesian
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Frequentist view • Dominant perspective for last century • Probability is an objective concept • DeFned as the frequency of an event occurring under repeated trials in “same” situation • E.g., number of heads in repeated coin tosses • Restricts application of probability to repeatable events
Background image of page 6
Bayesian view • Increasing importance over last decade • Due to increase in computational power that facilitates previously intractable calculations • Probability is a subjective concept • DeFned as individual degree-of-belief that event will occur • E.g., belief that we will have another snow storm tomorrow • Begin with prior belief estimates and update those by conditioning on observed data
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Bayesian vs. frequentist • Bayesian central tenet: • Explicitly model all forms of uncertainty • E.g., Parameters, model structure, predictions • Frequentist often model same uncertainty but in less-principled manner, e.g.,: • Parameters set by cross-validation • Model structure averaged in ensembles • Smoothing of predicted probabilities • Although interpretation of probability is different, underlying calculus is the same
Background image of page 8
Random variables • Mapping from a property of objects to a variable that can take one of a set of possible values X refers to random variable x refers to a value of that random variable • Discrete RV has a Fnite set of possible values • Continuous RV can take any value within an interval
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
• Probability distribution (or probability mass function or probability density function) • Discrete • Denotes probability that X will take on a particular value • Continuous • Probability of any particular point is 0, have to consider probability within an interval RVs (cont’) P ( X = x ) P ( a < X < b )= ± b a p ( x ) dx
Background image of page 10
Expectation • Discrete • Continuous • Variance • Linearity of expectation E [ X ]= ± x x · p ( x ) E [ X x x · p ( x ) dx V ar ( X )= E [ X 2 ] - ( E [ X ]) 2 E [ X + Y E [ X ]+ E [ Y ]
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Common distributions • Bernoulli • Binomial • Multinomial • Poisson • Normal
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 41

lecture2 - Data Mining CS57300 Purdue University August 31,...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online