1 CS 464: Introduction to Machine Learning Bayesian Learning Slides adapted from Section 6.1, 6.2, 6.3, and 6.9 Machine Learning by Tom M. Mitchell 2 Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners Naive Bayes learner 3 Roles for Bayesian Methods Provides practical learning algorithms: • Naive Bayes learning • Bayesian belief network learning (will be covered later) • Combine prior knowledge (prior probabilities) with observed data • Requires prior probabilities Provides useful conceptual framework • Provides “gold standard” for evaluating other learning algorithms • Provides insight into Occam's razor 4 Bayes Theorem In machine learning, we try to determine the best hypothesis from some hypothesis space H, given the observed training data D. In Bayesian learning, the best hypothesis means the most probable hypothesis, given the data D plus any initial knowledge about the prior probabilities of the various hypotheses in H. Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior probability, the probabilities of observing various data given the hypothesis, and the observed data itself. 5 Bayes Theorem P( h | D ) = P( D | h ) P( h ) / P( D ) • P( h ) = prior prob. of hypothesis h • P( D ) = prior prob. of training data D • P( h | D ) = probability of h given D • P( D | h ) = probability of D given h 6 Bayes Theorem - Example Sample Space for events A and B P(A) = 4/7 P(B) = 3/ 7 P(B|A) = 2/4 P(A|B) = 2/3 Is Bayes Theorem correct? P(B|A) = P(A|B)P(B) / P(A) = ( 2/3 * 3/7 ) / 4/7 = 2/4 b CORRECT P(A|B) = P(B|A)P(A) / P(B) = ( 2/4 * 4/7 ) / 3/7 = 2/3 b CORRECT A holds T T F F T F T B holds T F T F T F F
7 Choosing Hypotheses Natural choice is most probable hypothesis
