CS 464: Introduction to
Machine Learning
Bayesian Learning
Slides adapted from Section 6.1, 6.2, 6.3, and 6.9
Machine Learning
by Tom M. Mitchell
http://www2.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlbook.html
Bayesian Learning
•
Bayes Theorem
•
MAP, ML hypotheses
•
MAP learners
•
Naive Bayes learner
Roles for Bayesian Methods
Provides practical learning algorithms:
• Naive Bayes learning
• Bayesian belief network learning (will be
covered later)
• Combine prior knowledge (prior probabilities)
with observed data
• Requires prior probabilities
Provides useful conceptual framework
• Provides “gold standard” for evaluating other
learning algorithms
• Provides insight into Occam's razor
Bayes Theorem
•
In machine learning, we try to determine the
best
hypothesis
from some hypothesis space H, given the
observed training data D.
•
In Bayesian learning, the
best hypothesis
means the
most probable
hypothesis, given the data D plus any
initial knowledge about the prior probabilities of the
various hypotheses in H.
•
Bayes theorem provides a way to calculate the
probability of a hypothesis based on its prior
probability, the probabilities of observing various data
given the hypothesis, and the observed data itself.
Bayes Theorem
P(
h

D
) = P(
D

h
) P(
h
) / P(
D
)
• P(
h
) = prior prob. of hypothesis
h
• P(
D
) = prior prob. of training data
D
• P(
h

D
) = probability of
h
given
D
• P(
D

h
) = probability of
D
given
h
Bayes Theorem  Example
Sample Space for
events A and B
P(A) = 4/7 P(B) = 3/ 7 P(BA) = 2/4
P(AB) = 2/3
Is Bayes Theorem correct?
P(BA) = P(AB)P(B) / P(A) = ( 2/3 * 3/7 ) / 4/7 = 2/4
b
CORRECT
P(AB) = P(BA)P(A) / P(B) = ( 2/4 * 4/7 ) / 3/7 = 2/3
b
CORRECT
A holds
T
T
F
F
T
F
T
B holds
T
F
T
F
T
F
F
Choosing Hypotheses
Natural choice is most probable hypothesis
 Spring '11
 NoProfessor
 Machine Learning, Bayesian probability, Bayes Theorem, Bayesian network

