MIT15_097S12_lec15

MIT15_097S12_lec15 - 15.097 Probabilistic Modeling and...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
15.097: Probabilistic Modeling and Bayesian Analysis Ben Letham and Cynthia Rudin Credits: Bayesian Data Analysis by Gelman, Carlin, Stern, and Rubin 1 Introduction and Notation Up to this point, most of the machine learning tools we discussed (SVM, Boosting, Decision Trees,...) do not make any assumption about how the data were generated. For the remainder of the course, we will make distri- butional assumptions, that the underlying distribution is one of a set. Given data, our goal then becomes to determine which probability distribution gen- erated the data. We are given m data points y 1 , . . . , y m , each of arbitrary dimension. Let y = { y 1 , . . . , y m } denote the full set of data. Thus y is a random variable, whose probability density function would in probability theory typically be denoted as f y ( { y 1 , . . . , y m } ). We will use a standard (in Bayesian analysis) shorthand notation for probability density functions, and denote the proba- bility density function of the random variable y as simply p ( y ). We will assume that the data were generated from a probability distribution that is described by some parameters θ (not necessarily scalar). We treat θ as a random variable. We will use the shorthand notation p ( y | θ ) to represent the family of conditional density functions over y , parameterized by the ran- dom variable θ . We call this family p ( y | θ ) a likelihood function or likelihood model for the data y , as it tells us how likely the data y are given the model specified by any value of θ . We specify a prior distribution over θ , denoted p ( θ ). This distribution rep- resents any knowledge we have about how the data are generated prior to 1
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
observing them. Our end goal is the conditional density function over θ , given the observed data, which we denote as p ( θ | y ). We call this the posterior distribution, and it informs us which parameters are likely given the observed data. We, the modeler, specify the likelihood function (as a function of y and θ ) and the prior (we completely specify this) using our knowledge of the system at hand. We then use these quantities, together with the data, to compute the posterior. The likelihood, prior, and posterior are all related via Bayes’ rule: p ( y | θ ) p ( θ ) p ( y | θ ) p ( θ ) p ( θ | y ) = = , (1) p ( y ) p ( y | θ ' ) p ( θ ' ) ' where the second step uses the law of total probability. Unfortunately the integral in the denominator, called the partition function , is often intractable. This is what makes Bayesian analysis difficult, and the remainder of the notes will essentially be methods for avoiding that integral. Coin Flip Example Part 1. Suppose we have been given data from a se- ries of m coin flips, and we are not sure if the coin is fair or not. We might assume that the data were generated by a sequence of independent draws from a Bernoulli distribution, parameterized by θ , which is the probability of flipping Heads.
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern