Unformatted text preview: Announcements CS 188: Artificial Intelligence
Spring 2011 Section We’ll be using some software to play with Bayes nets:
Bring your laptop! Download necessary files (links also in the handout): Lecture 17: Bayes Nets V
3/30/2011 http://wwwinst.eecs.berkeley.edu/~cs188/sp11/bayes/bayes.jar and
http://wwwinst.eecs.berkeley.edu/~cs188/sp11/bayes/network.xml Assignments P4 and contest going out Monday Pieter Abbeel – UC Berkeley
Presenter: Arjun Singh
Many slides over this course adapted from Dan Klein, Stuart Russell,
Andrew Moore 2 Outline Bayes Net Semantics Bayes net refresher: A set of nodes, one per variable X Representation Exact Inference A directed, acyclic graph An A conditional distribution for each node A collection of distributions over X, one for
each combination of parents values Enumeration Variable elimination Approximate inference through sampling X CPT: conditional probability table 3 Probabilities in BNs A Bayes net = Topology (graph) + Local Conditional Probabilities 5 Inference by Enumeration For all joint distributions, we have (chain rule): Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them Bayes nets implicitly encode joint distributions As a product of local conditional distributions To see what probability a BN gives to a full assignment, multiply
all the relevant conditionals together: Building the full joint table takes time and
space exponential in the number of
variables This lets us reconstruct any entry of the full joint Not every BN can represent every joint distribution The topology enforces certain conditional independencies A1 6 8 1 General Variable Elimination Approximate Inference Query: Simulation has a name: sampling (e.g. predicting the
weather, basketball games…) Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize Complexity is exponential in the number of variables
appearing in the factorscan depend on ordering but
even best ordering is often impractical Worst case is bad: we can encode 3SAT with a Bayes
net (NPcomplete) Basic idea: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Why sample? Learning: get samples from a distribution you don t know Inference: getting a sample is faster than computing the right
answer (e.g. with variable elimination)
9 11 Sampling Sampling Example How do you sample? Each value in the domain of W has a subinterval of [0,1] with a size equal to its
probability Simplest way is to use a random number
generator to get a continuous value uniformly
distributed between 0 and 1 (e.g. random() in
Python) Assign each value in the domain of your
random variable a subinterval of [0,1] with a
size equal to its probability P u is a uniform random value in [0, 1] sun 0.6 rain 0.1 if 0.0 ≤ u < 0.6, w = sun fog 0.3 meteor 0.0 W The subintervals cannot overlap if 0.6 ≤ u < 0.7, w = rain
if 0.7 ≤ u < 1.0, w = fog e.g. if random() returns u = 0.83, then our sample is w = fog
12 13 Prior Sampling
+c
c Prior Sampling This process generates samples with probability: 0.5 0.5 Cloudy
+c +s 0.1
s 0.9
c +s 0.5
s 0.5 +s
s +r
r +r
r Sprinkler +w
w +w
w +w
w +w
w 0.99 0.01 0.90 0.10 0.90 0.10 0.01 0.99 Rain WetGrass +c
c +r
r +r
r 0.8 0.2 0.2 0.8 …i.e. the BN s joint probability Let the number of samples of an event be Samples: Then +c, s, +r, +w
c, +s, r, +w
…
14 I.e., the sampling procedure is consistent 15 2 Example Rejection Sampling We ll get a bunch of samples from the BN:
+c, s, +r, +w
+c, +s, +r, +w
c, +s, +r, w
+c, s, +r, +w
c, s, r, +w Let s say we want P(C) No point keeping all samples around Just tally counts of C as we go Cloudy
C
Sprinkler
S RR
ain Let s say we want P(C +s) WetGrass
W Cloudy
C Same thing: tally C outcomes, but
ignore (reject) samples which don t
have S=+s This is called rejection sampling It is also consistent for conditional
probabilities (i.e., correct in the limit) If we want to know P(W)
We have counts <+w:4, w:1>
Normalize to get P(W) = <+w:0.8, w:0.2>
This will get closer to the true distribution with more samples
Can estimate anything else, too
What about P(C +w)? P(C +r, +w)? P(C r, w)?
Fast: can use fewer samples if less time (what s the drawback?) Sprinkler
S RR
ain
WetGrass
W +c, s, +r, +w
+c, +s, +r, +w
c, +s, +r, w
+c, s, +r, +w
c, s, r, +w 17 16 Likelihood Weighting Likelihood Weighting Problem with rejection sampling: +c
c If evidence is unlikely, you reject a lot of samples You don t exploit your evidence as you sample Consider P(B+a)
Burglary b, a
b, a
b, a
b, a
+b, +a Alarm Idea: fix evidence variables and sample the rest
Burglary Alarm Cloudy
+c +s 0.1
s 0.9
c +s 0.5
s 0.5 b +a
b, +a
b, +a
b, +a
+b, +a Problem: sample distribution not consistent! Solution: weight by probability of evidence given parents Likelihood Weighting 19 +s
s +r
r +r
r Sprinkler +w
w +w
w +w
w +w
w 0.99 0.01 0.90 0.10 0.90 0.10 0.01 0.99 Rain WetGrass +r
r +r
r +c
c 0.8 0.2 0.2 0.8 Samples:
+c, +s, +r, +w
… 20 Likelihood Weighting Likelihood weighting is good Sampling distribution if z sampled and e fixed evidence
Cloudy
C Now, samples have weights 0.5 0.5 S R
W We have taken evidence into account as
we generate the sample E.g. here, W s value will get picked
based on the evidence values of S, R More of our samples will reflect the state
of the world suggested by the evidence Likelihood weighting doesn t solve
all our problems Together, weighted sampling distribution is consistent Cloudy
C
S RR
ain
W Evidence influences the choice of
downstream variables, but not upstream
ones (C isn t more likely to get a value
matching the evidence)
21 We would like to consider evidence
when we sample every variable 22 3 Gibbs Sampling Gibbs Sampling Idea: instead of sampling from scratch, create samples
that are each like the last one. Say we want to sample P(S  R = +r) Step 1: Initialize Set evidence (R = +r) Set all other variables (S, C, W) to random values (e.g. by prior
sampling or just uniformly sampling; say S = s, W = +w, C = c) Our initial sample is then: (R = +r, S = s, W = +w, C = c) Procedure: resample one variable at a time, conditioned
on all the rest, but keep evidence fixed. Steps 2+: Repeat the following for some number of iterations Properties: Now samples are not independent (in fact
they’re nearly identical), but sample averages are still
consistent estimators! Choose a nonevidence variable (S, W, or C in this case) Sample this variable conditioned on nothing else changing The first time through, if we pick S, we sample from P(S  R = +r, W = +w, C =c) The new sample can only be different in a single variable What s the point: both upstream and downstream
variables condition on evidence.
24 Gibbs Sampling 25 Gibbs Sampling Example How is this better than sampling from the full
joint? In a Bayes net, sampling a variable given all the
other variables (e.g. P(RS,C,W)) is usually
much easier than sampling from the full joint
distribution Only requires a join on the variable to be sampled (in
this case, a join on R) The resulting factor only depends on the variable’s
parents, its children, and its children’s parents (this is
often referred to as its Markov blanket)
26 Want to sample from P(R  +s,c,w) Shorthand for P(R  S=+s,C=c,W=w)
P (R + s, −c, −w) =
=
=
= P (R, +s, −c, −w)
P (+s, −c, −w)
P (R,+s,−c,−w)
r P (R=r,+s,−c,−w )
r r P (−c)P (+s−c)P (R−c)P (−w+s,R)
P (−c)P (+s−c)P (R=r −c)P (−w+s,R=r )
P (R−c)P (−w+s,R)
P (R=r −c)P (−w+s,R=r ) Many things cancel out  just a join on R!
27 Further Reading* Gibbs sampling is a special case of more
general methods called Markov chain Monte
Carlo (MCMC) methods MetropolisHastings is one of the more famous
MCMC methods (in fact, Gibbs sampling is a
special case of MetropolisHastings) You may read about Monte Carlo methods –
they’re just sampling
29 4 ...
View
Full
Document
This note was uploaded on 08/26/2011 for the course CS 188 taught by Professor Staff during the Spring '08 term at Berkeley.
 Spring '08
 Staff
 Artificial Intelligence

Click to edit the document details