This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS228 Final 1 CS 228, Winter 2006 Final Handout #16 1. [8 points] Influence Diagrams Consider a Influence Diagram containing a single utility factor in which all variables other than D(a decision node) and its parents have been eliminated using standard VE. Show that the decision rule D that maximizes: X D, Pa D D  D ( D, Pa D ) . is defined as: D ( w ) = 1 , d = argmax d V al ( D )  D ( d, w ) , otherwise for w V al ( Pa D ). 2. [18 points] Causality (a) [5 points] For probabilistic queries, we have that min x P ( y  x ) P ( y ) max x P ( y  x ) . Show that the same property does not hold for intervention queries. Specifically, provide an example where it is not the case that: min x P ( y  do ( x )) P ( y ) max x P ( y  do ( x )) . (b) [6 points] As for probabilistic independence, we can define a notion of causal in dependence: ( X C Y  Z ) if, for any values x , x V al ( X ), we have that P ( Y  do ( Z ) , do ( x )) = P ( Y  do ( Z ) , do ( x )). (Note that, unlike probabilistic in dependence ( X Y  Z ) causal independence is not symmetric over X , Y .) Is causal independence equivalent to the statement: For any value x V al ( X ), we have that P ( Y  do ( Z ) , do ( x )) = P ( Y  do ( Z )). (Hint: Use your result from (a).) (c) [7 points] Prove that ( X C Y  Z , W ) and ( W C Y  X , Z ) implies that ( X , W C Y  Z ). Intuitively, this property states that, if changing X cannot affect P ( Y ) when W is fixed, and changing W cannot affect P ( Y ) when X is fixed, then changing X and W together cannot affect P ( Y ). 3. [10 points] Learning in DBNs (a) [5 points] Suppose that we have fullyobserved sequences (from time 0 to T) of the variables in a DBN. What is the marginal likelihood P ( D  G ) of the DBN using Dirichlet (1 , . . . , 1) priors on the parameters? (b) [5 points] Now suppose that we want to learn the optimal DBN structure (a B , B pair) for the data from part(a) given some initial structure. Can we do this by running the usual greedy structure search algorithm (p. 399) on the BN produced by unrolling our DBN T time steps? If so, why? If not, what changes must be made? CS228 Final 2 4. [21 points] Inference Consider a chain probabilistic network X 1 X 2  X n and a corresponding clique tree of the form C 1  C n 1 where Scope [ C i ] = { X i , X i +1 } . Each variable....
View Full
Document
 Winter '09

Click to edit the document details