SP11 cs188 lecture 17 -- bayes nets V 6pp

SP11 cs188 lecture 17 -- bayes nets V 6pp - Announcements...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Announcements CS 188: Artificial Intelligence Spring 2011   Section   We’ll be using some software to play with Bayes nets: Bring your laptop!   Download necessary files (links also in the handout): Lecture 17: Bayes Nets V 3/30/2011 http://www-inst.eecs.berkeley.edu/~cs188/sp11/bayes/bayes.jar and http://www-inst.eecs.berkeley.edu/~cs188/sp11/bayes/network.xml   Assignments   P4 and contest going out Monday Pieter Abbeel – UC Berkeley Presenter: Arjun Singh Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore 2 Outline Bayes Net Semantics   Bayes net refresher:   A set of nodes, one per variable X   Representation   Exact Inference   A directed, acyclic graph An   A conditional distribution for each node   A collection of distributions over X, one for each combination of parents values   Enumeration   Variable elimination   Approximate inference through sampling X   CPT: conditional probability table 3 Probabilities in BNs A Bayes net = Topology (graph) + Local Conditional Probabilities 5 Inference by Enumeration   For all joint distributions, we have (chain rule):   Given unlimited time, inference in BNs is easy   Recipe:   State the marginal probabilities you need   Figure out ALL the atomic probabilities you need   Calculate and combine them   Bayes nets implicitly encode joint distributions   As a product of local conditional distributions   To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together:   Building the full joint table takes time and space exponential in the number of variables   This lets us reconstruct any entry of the full joint   Not every BN can represent every joint distribution   The topology enforces certain conditional independencies A1 6 8 1 General Variable Elimination Approximate Inference   Query:   Simulation has a name: sampling (e.g. predicting the weather, basketball games…)   Start with initial factors:   Local CPTs (but instantiated by evidence)   While there are still hidden variables (not Q or evidence):   Pick a hidden variable H   Join all factors mentioning H   Eliminate (sum out) H   Join all remaining factors and normalize   Complexity is exponential in the number of variables appearing in the factors---can depend on ordering but even best ordering is often impractical   Worst case is bad: we can encode 3-SAT with a Bayes net (NP-complete)   Basic idea:   Draw N samples from a sampling distribution S   Compute an approximate posterior probability   Show this converges to the true probability P   Why sample?   Learning: get samples from a distribution you don t know   Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination) 9 11 Sampling Sampling Example   How do you sample?   Each value in the domain of W has a subinterval of [0,1] with a size equal to its probability   Simplest way is to use a random number generator to get a continuous value uniformly distributed between 0 and 1 (e.g. random() in Python)   Assign each value in the domain of your random variable a sub-interval of [0,1] with a size equal to its probability P u is a uniform random value in [0, 1] sun 0.6 rain 0.1 if 0.0 ≤ u < 0.6, w = sun fog 0.3 meteor 0.0 W   The sub-intervals cannot overlap if 0.6 ≤ u < 0.7, w = rain if 0.7 ≤ u < 1.0, w = fog e.g. if random() returns u = 0.83, then our sample is w = fog 12 13 Prior Sampling +c  ­c Prior Sampling   This process generates samples with probability: 0.5 0.5 Cloudy +c +s 0.1  ­s 0.9  ­c +s 0.5  ­s 0.5 +s  ­s +r  ­r +r  ­r Sprinkler +w  ­w +w  ­w +w  ­w +w  ­w 0.99 0.01 0.90 0.10 0.90 0.10 0.01 0.99 Rain WetGrass +c  ­c +r  ­r +r  ­r 0.8 0.2 0.2 0.8 …i.e. the BN s joint probability   Let the number of samples of an event be Samples:   Then +c, -s, +r, +w -c, +s, -r, +w … 14   I.e., the sampling procedure is consistent 15 2 Example Rejection Sampling   We ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w   Let s say we want P(C)   No point keeping all samples around   Just tally counts of C as we go Cloudy C Sprinkler S RR ain   Let s say we want P(C| +s) WetGrass W             Cloudy C   Same thing: tally C outcomes, but ignore (reject) samples which don t have S=+s   This is called rejection sampling   It is also consistent for conditional probabilities (i.e., correct in the limit)   If we want to know P(W) We have counts <+w:4, -w:1> Normalize to get P(W) = <+w:0.8, -w:0.2> This will get closer to the true distribution with more samples Can estimate anything else, too What about P(C| +w)? P(C| +r, +w)? P(C| -r, -w)? Fast: can use fewer samples if less time (what s the drawback?) Sprinkler S RR ain WetGrass W +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w 17 16 Likelihood Weighting Likelihood Weighting   Problem with rejection sampling: +c  ­c   If evidence is unlikely, you reject a lot of samples   You don t exploit your evidence as you sample   Consider P(B|+a) Burglary -b, -a -b, -a -b, -a -b, -a +b, +a Alarm   Idea: fix evidence variables and sample the rest Burglary Alarm Cloudy +c +s 0.1  ­s 0.9  ­c +s 0.5  ­s 0.5 -b +a -b, +a -b, +a -b, +a +b, +a   Problem: sample distribution not consistent!   Solution: weight by probability of evidence given parents Likelihood Weighting 19 +s  ­s +r  ­r +r  ­r Sprinkler +w  ­w +w  ­w +w  ­w +w  ­w 0.99 0.01 0.90 0.10 0.90 0.10 0.01 0.99 Rain WetGrass +r  ­r +r  ­r +c  ­c 0.8 0.2 0.2 0.8 Samples: +c, +s, +r, +w … 20 Likelihood Weighting   Likelihood weighting is good   Sampling distribution if z sampled and e fixed evidence Cloudy C   Now, samples have weights 0.5 0.5 S R W   We have taken evidence into account as we generate the sample   E.g. here, W s value will get picked based on the evidence values of S, R   More of our samples will reflect the state of the world suggested by the evidence   Likelihood weighting doesn t solve all our problems   Together, weighted sampling distribution is consistent Cloudy C S RR ain W   Evidence influences the choice of downstream variables, but not upstream ones (C isn t more likely to get a value matching the evidence) 21   We would like to consider evidence when we sample every variable 22 3 Gibbs Sampling Gibbs Sampling   Idea: instead of sampling from scratch, create samples that are each like the last one.   Say we want to sample P(S | R = +r)   Step 1: Initialize   Set evidence (R = +r)   Set all other variables (S, C, W) to random values (e.g. by prior sampling or just uniformly sampling; say S = -s, W = +w, C = -c)   Our initial sample is then: (R = +r, S = -s, W = +w, C = -c)   Procedure: resample one variable at a time, conditioned on all the rest, but keep evidence fixed.   Steps 2+: Repeat the following for some number of iterations   Properties: Now samples are not independent (in fact they’re nearly identical), but sample averages are still consistent estimators!   Choose a non-evidence variable (S, W, or C in this case)   Sample this variable conditioned on nothing else changing   The first time through, if we pick S, we sample from P(S | R = +r, W = +w, C =-c)   The new sample can only be different in a single variable   What s the point: both upstream and downstream variables condition on evidence. 24 Gibbs Sampling 25 Gibbs Sampling Example   How is this better than sampling from the full joint?   In a Bayes net, sampling a variable given all the other variables (e.g. P(R|S,C,W)) is usually much easier than sampling from the full joint distribution   Only requires a join on the variable to be sampled (in this case, a join on R)   The resulting factor only depends on the variable’s parents, its children, and its children’s parents (this is often referred to as its Markov blanket) 26   Want to sample from P(R | +s,-c,-w)   Shorthand for P(R | S=+s,C=-c,W=-w) P (R| + s, −c, −w) = = = = P (R, +s, −c, −w) P (+s, −c, −w) ￿ P (R,+s,−c,−w) r P (R=r,+s,−c,−w ) ￿ ￿ r r P (−c)P (+s|−c)P (R|−c)P (−w|+s,R) P (−c)P (+s|−c)P (R=r |−c)P (−w|+s,R=r ) P (R|−c)P (−w|+s,R) P (R=r |−c)P (−w|+s,R=r )   Many things cancel out -- just a join on R! 27 Further Reading*   Gibbs sampling is a special case of more general methods called Markov chain Monte Carlo (MCMC) methods   Metropolis-Hastings is one of the more famous MCMC methods (in fact, Gibbs sampling is a special case of Metropolis-Hastings)   You may read about Monte Carlo methods – they’re just sampling 29 4 ...
View Full Document

This note was uploaded on 08/26/2011 for the course CS 188 taught by Professor Staff during the Spring '08 term at Berkeley.

Ask a homework question - tutors are online