MIT15_097S12_lec15

# MIT15_097S12_lec15 - 15.097 Probabilistic Modeling and...

This preview shows pages 1–3. Sign up to view the full content.

15.097: Probabilistic Modeling and Bayesian Analysis Ben Letham and Cynthia Rudin Credits: Bayesian Data Analysis by Gelman, Carlin, Stern, and Rubin 1 Introduction and Notation Up to this point, most of the machine learning tools we discussed (SVM, Boosting, Decision Trees,...) do not make any assumption about how the data were generated. For the remainder of the course, we will make distri- butional assumptions, that the underlying distribution is one of a set. Given data, our goal then becomes to determine which probability distribution gen- erated the data. We are given m data points y 1 , . . . , y m , each of arbitrary dimension. Let y = { y 1 , . . . , y m } denote the full set of data. Thus y is a random variable, whose probability density function would in probability theory typically be denoted as f y ( { y 1 , . . . , y m } ). We will use a standard (in Bayesian analysis) shorthand notation for probability density functions, and denote the proba- bility density function of the random variable y as simply p ( y ). We will assume that the data were generated from a probability distribution that is described by some parameters θ (not necessarily scalar). We treat θ as a random variable. We will use the shorthand notation p ( y | θ ) to represent the family of conditional density functions over y , parameterized by the ran- dom variable θ . We call this family p ( y | θ ) a likelihood function or likelihood model for the data y , as it tells us how likely the data y are given the model specified by any value of θ . We specify a prior distribution over θ , denoted p ( θ ). This distribution rep- resents any knowledge we have about how the data are generated prior to 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
observing them. Our end goal is the conditional density function over θ , given the observed data, which we denote as p ( θ | y ). We call this the posterior distribution, and it informs us which parameters are likely given the observed data. We, the modeler, specify the likelihood function (as a function of y and θ ) and the prior (we completely specify this) using our knowledge of the system at hand. We then use these quantities, together with the data, to compute the posterior. The likelihood, prior, and posterior are all related via Bayes’ rule: p ( y | θ ) p ( θ ) p ( y | θ ) p ( θ ) p ( θ | y ) = = , (1) p ( y ) p ( y | θ ' ) p ( θ ' ) ' where the second step uses the law of total probability. Unfortunately the integral in the denominator, called the partition function , is often intractable. This is what makes Bayesian analysis diﬃcult, and the remainder of the notes will essentially be methods for avoiding that integral. Coin Flip Example Part 1. Suppose we have been given data from a se- ries of m coin ﬂips, and we are not sure if the coin is fair or not. We might assume that the data were generated by a sequence of independent draws from a Bernoulli distribution, parameterized by θ , which is the probability of ﬂipping Heads.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern