This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 MachineLearning CS6375Fall 2010 a Bayesian Learning Reading: Sections 13.113.6, 20.120.2, R&N Sections 6.16.3, 6.7, 6.9, Mitchell 2 Uncertainty •Most realworld problems deal with uncertain information –Diagnosis: Likely disease given observed symptoms –Equipment repair: Likely component failure given sensor reading –Help desk: Likely operation based on past operations –Cannot be represented by deterministic rules Headache => Fever •Correct framework for representing uncertainty: Probability 3 Probability •P( A ) = Probability of event A = fraction of all possible worlds in which A is true. 4 Probability 5 Probability •Immediately derived properties More general version of the Theorem of Total Probability : IF we know that exactly one of B1 , B2 ..., Bnare true (i.e. P(B1 or B2 or ... Bn) = 1, and for all i,junequal, P(Biand Bj) = 0) THEN we know: P(A) = P(A, B1) + P(A, B2) + ... P(A, Bn) 6 Probability •A random variable is a variable X that can take values x 1 ,.., x n with a probability P( X = x i ) attached to each i = 1,.., n 7 Example My mood can take one of two values: Happy, Sad. The weather can take one of three values: Rainy, Sunny Cloudy. Given P(Mood=Happy ^ Weather=Rainy ) = 0.2 P(Mood=Happy ^ Weather=Sunny ) = 0.1 P(Mood=Happy ^ Weather=Cloudy ) = 0.4 Can I compute P(Mood=Happy)? Can I compute P(Mood=Sad)? Can I compute P(Weather=Rainy) ? 8 What’s so great about the axioms of probability? The axioms of probability mean you may not represent the following knowledge: P(A) = 0.4 P(B) = 0.3 P(A ^ B) = 0.0 P(A ∨ B) = 0.8 Would you ever want to do that? Difficult philosophical question. Pragmatic answer: if you disobey the laws of probability in your knowledge representation, you can make suboptimal decisions. 9 Conditional Probability •P( A  B ) = Fraction of those worlds in which B is true for which A is also true. 10 Conditional Probability Example • H = Headache P( H ) = 1/2 • F = Flu P( F) = 1/8 P( H  F ) = 1/2 11 Conditional Probability Example • H = Headache P( H ) = 1/2 • F = Flu P( F) = 1/8 P( H  F ) = (Area of “ H and F ”region) (Area of F region) P( H  F ) = P( H , F )/P( F ) P( H  F ) = 1/2 12 Conditional Probability •Definition: •Chain rule: Can you prove that P(A, B) <= P(A) for any events A and B? 13 Conditional Probability •Other useful relations: 14 Probabilistic Inference •What is the probability that F is true given H is true? Given •P( H ) = 1/2 •P( F) = 1/8 •P( H  F ) = 0.5 15 Probabilistic Inference •Correct reasoning: •We know P( H ), P( F ), P( H  F ) and the two chain rules: •Substituting the values: 16 BayesRule 17 BayesRule 18 BayesRule •What if we do not know P( A )??? •Use the relation: •More general Bayesrule: 19 BayesRule •Same rule for a nonbinary random variable, except we need to sum over all the possible events 20 Generalizing BayesRule If we know that exactly one of A 1 , A 2 , ..., A n are true, then: P(B) = P(BA 1 )P(A 1 ) + P(BA...
View
Full Document
 Fall '10
 VicentNg
 Conditional Probability, Probability theory, joint distribution, argmax h∈H

Click to edit the document details