This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS 440: Introduction to Artificial Inttelligence Lecture date: 12 February 2008 Instructor: Eyal Amir Guest Lecturer: Mark Richards 1 Variable Elimination in Bayesian Networks Review: Inference Given a joint probability distribution over variables a set of variables X = X 1 ,X 2 ,...,X n , we can make inferences of the form ( Y  Z ), where Y X is the set of query variables, Z X are the evidence variables. The other variables H (those not mentioned in the query) are called hidden variables. To be clear, X = Y Z H . The most naive and expensive way to do inference is to use the full joint probability distribution and sum out the hidden variables. By the product rule, P ( Y  Z ) P ( Z ) = P ( Y,Z ). (Note that this is true for any distribution. This does not have anything to do with independence.) So to answer the query P ( Y  Z ), we can compute P ( Y,Z ) P ( Z ) . Note that P ( Y,Z ) = H P ( Y,Z,H ). From the view of the full joint probability table, we are summing the probabilities for all of the entries in the table that match the values of the query variables and evidence variables (this includes entries for all of the combinations of values for the hidden variables) and dividing by the sum of all the entries that match the values of the evidence variables. (Note that the values in the first sum are a subset of the values in the latter sum.) This method is straightforward conceptually but has many problems. First, you have to compute and store the values in the full joint distribution. The number of entries is Q i Options ( X i ), where Options ( X i ) is the number of different values that variable X i may have. So for n Boolean variables, the full joint has 2 n entries. For large networks, it is unreasonable to store/compute this many values. Furthermore, if a query requires summing out most of the variables, the time complexity is O (2 n ) as well. Normalizing Suppose we have variables A and B. We want to compute P ( A  b ). We use b to denote B = true . The result of our query P ( A,b ) is a twoelement table that specifies P ( a,b ) and P ( a,b ).Since P ( A  b ) P ( b ) = P ( A,b ), we can compute P ( A  b ) = P ( A,b ) P ( b ) . But we also know that P ( b ) = P ( a,b ) + P ( a,b ). Since we need both P ( a,b ) and P ( ,b ) to answer the query, this may actually be the best way to compute P ( b ). The way this is expressed in the book is that P ( A  b ) = P ( A,b ), where is a normalizing constant that makes the values in the table P ( A  b ) sum to 1. (In this case, is P ( b ), but the idea is that it doesnt matter what it is; we just know that we need to normalize to make sure that the values in P ( A  b ) sum to 1....
View
Full
Document
 Spring '08
 EyalAmir
 Artificial Intelligence

Click to edit the document details