hw2sol - 10708 Graphical Models: Homework 2 Solutions 1...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 10708 Graphical Models: Homework 2 Solutions 1 I-equivalence 1.1 We want to show that two graphs G1 and G2 are I-equivalent if 1) they have the same trails, and 2) a trail is active in G1 iff it is active in G2 . It is easy to see that two graphs have the same set of trails iff they have the same skeleton. Also, two graphs are I-equivalent if I(G1 ) = I(G2 ) , by definition. Thus, we assume that G1 and G2 have the same skeleton and the same set of active trails, and for the sake of contradiction, we assume that they are not I-equivalent. This means they have different independence assertions. This can come about if G1 and G2 have different skeletons, but this violates our first assumption. If we assume that they have the same skeletons, then the only way they can have different independence assertions is if two variables (u, v) are d-separated in one graph (say, G1 ) but are dependent in the other (G2 ), under the same evidence set E. However, if this is the case, then there are no active trails between u and v in G1 , but at least one in G2 , violating our second assumption. Therefore, by contradiction, G1 and G2 must be I-equivalent. 1.2 In 1.1, we showed that if G1 and G2 have the same skeleton and the same active trails, then they must be I-equivalent. We now want to show that if G1 and G2 have the same skeleton and the same v-structures, then they are I-equivalent. Assume G1 and G2 have the same skeleton and v-structures, and assume (u ; v) is an active trail in G1 given some evidence set E. In order for (u ; v) to be active, E cannot contain any variables xi such that xi-1 xi xi+1 , xi-1 xi xi+1 , or xi-1 xi xi+1 exist on the active trail. Likewise, for any v-structure on this trail with center node xi , either xi or its descendents must exist in E. Since G2 has the same v-structures and skeleton, the trail (u ; v) in G2 contains exactly the same v-structures it did in G1 , and furthermore, these v-structures have the same set of descendents. Thus, since the two graphs have the same evidence set, the trail is active in G1 iff it is active in G2 . At this point, by 1.1, we complete the proof, showing that G1 and G2 must be I-equivalent. 1 2 Decomposable Scores 2.1 Let G be a network structure, and score be a decomposable score. 2.1.1 Assume o is "Add X Y ", and X Y G / G (o) = score(o(G) : D) - score(G : D) = i F amScore(Zi |P ai o(G) : D) - i F amScore(Zi |P aG : D) i We can pull out all of the family scores except for the ones for node Y : = Z=Y F amScore(Z|P ai o(G) : D) - F amScore(Z|P aG : D)+F amScore(Y |P aY i o(G) : D)-F amScore(Y |P aG : D) Y These disappear because only Y 's parents changed. The other family scores are identical. = F amScore(Y |P aY o(G) : D) - F amScore(Y |P aG : D) Y Rewriting these family scores in terms of what changed: = F amScore(Y, P aG {X} : D) - F amScore(Y, P aG : D) Y Y Thus we conclude that if o is "Add X Y ", and X Y G, then / G (o) = F amScore(Y, P aG {X} : D) - F amScore(Y, P aG : D) Y Y 2.1.2 Assume o is "Delete X Y ", and X Y G.By reasoning from 2.1.1: G (o) = F amScore(Y |P aY o(G) : D) - F amScore(Y |P aG : D) Y Rewriting these family scores in terms of what changed: = F amScore(Y, P aG - {X} : D) - F amScore(Y, P aG : D) Y Y Thus we conclude that if o is "Delete X Y ", and X Y G, then G (o) = F amScore(Y, P aG - {X} : D) - F amScore(Y, P aG : D) Y Y 2.1.3 Assume o is "Reverse X Y ", and X Y G G (o) = score(o(G) : D) - score(G : D) = i F amScore(Zi |P ai o(G) : D) - i F amScore(Zi |P aG : D) i 2 We can pull out all of the family scores except for the ones for node Y and node X: = ZG\(XY ) F amScore(Z|P ai o(G) o(G) : D) - F amScore(Z|P aG : D) + i o(G) F amScore(Y |P aY = F amScore(Y |P aY : D) - F amScore(Y |P aG : D) + F amScore(X|P aX Y : D) - F amScore(Y |P aG : D) + F amScore(X|P aX Y : D) - F amScore(X|P aG : D) X : D) - F amScore(X|P aG : D) X o(G) o(G) Rewriting these family scores in terms of what changed: = F amScore(X, P aG {Y } : D) + F amScore(Y, P aG - {X} : D) - X Y F amScore(X, P aG : D) - F amScore(Y, P aG : D) X Y Thus we conclude that if o is "Reverse X Y ", and X Y G, then G (o) = F amScore(X, P aG {Y } : D) + F amScore(Y, P aG - {X} : D) - X Y F amScore(X, P aG : D) - F amScore(Y, P aG : D) X Y 2.2 Let G and G be two network structures, and score be a decomposable score. 2.2.1 Assume o is "Add X Y " and P aG = P aG . Y Y Using Proposition 15.4.5: G (o) = F amScore(Y, P aG {X} : D) - F amScore(Y, P aG : D) Y Y = F amScore(Y, P aG {X} : D) - F amScore(Y, P aG : D) Y Y = G (o) Assume o is "Delete X Y " and P aG = P aG . Y Y Using Proposition 15.4.5: G (o) = F amScore(Y, P aG {X} : D) - F amScore(Y, P aG : D) Y Y = F amScore(Y, P aG - {X} : D) - F amScore(Y, P aG : D) Y Y = G (o) We conclude that if o is either "Add X Y " or "Delete X Y " and P aG = P aG , then G (o) = G (o). Y Y 2.2.2 Assume o is "Reverse X Y ", P aG = P aG , and P aG = P aG . Y Y X X 3 G (o) = F amScore(X, P aG {Y } : D) + F amScore(Y, P aG - {X} : D) - X Y F amScore(X, P aG : D) - F amScore(Y, P aG : D) X Y = F amScore(X, P aG {Y } : D) + F amScore(Y, P aG - {X} : D) - X Y F amScore(X, P aG : D) - F amScore(Y, P aG : D) X Y = G (o) We conclude that if o is "Reverse X Y ", P aG = P aG , and P aG = P aG then G (o) = G (o). Y Y X X 3 Learning Edge Directions 1. I'll use dotted lines to denote a deleted edge. The possible BNs on the skeleton X1 - -X2 - -X3 and family scores are: (a) X1 X2 X3 : (b) X1 X2 X3 : (c) X1 X2 X3 : (d) X1 X2 X3 : (e) X1 X2 X3 : (f) X1 X2 X3 : (g) X1 X2 X3 : (h) X1 X2 X3 : (i) X1 X2 X3 : F S(X1 |) + F S(X2 |X1 ) + F S(X3 |X2 ) F S(X1 |) + F S(X2 |X1 X3 ) + F S(X3 |) F S(X1 |) + F S(X2 |X1 ) + F S(X3 |) F S(X1 |X2 ) + F S(X2 |) + F S(X3 |X2 ) F S(X1 |X2 ) + F S(X2 |X3 ) + F S(X3 |) F S(X1 |X2 ) + F S(X2 |) + F S(X3 |) F S(X1 |) + F S(X2 |) + F S(X3 |X2 ) F S(X1 |) + F S(X2 |X3 ) + F S(X3 |) F S(X1 |) + F S(X2 |) + F S(X3 |) 2. Now if we have the skeleton X1 - -X2 - -X3 - -X4 , the decision about the edge X1 - -X2 does not affect the family score of X3 because the family score of X3 depends only on the decisions of the edges between itself and X2 and X4 . 3. There is a linear time dynamic programming algorithm for finding the optimal BN from a chain skeleton X1 --X2 --X3 -- --Xn . To find this algorithm involves recasting the question of finding the optimal BN in a recursive way. Let k be a number such that 1 k < n. Given the direction of the edge between Xk and Xk+1 , we would like to know the highest scoring structure for X1 - -X2 - - - -Xk and its score. The algorithm will build a 3 n table T which holds these scores (If Xk Xk+1 , we will denote the score k of the optimal structure up to k by the table element T ). As the following pseudocode shows, column T k can be built using only column Tk-1 and local scores as enumerated in part (1). (I will also use a table of k pointers B for a backtracking step - each B// will hold three possible values: , or .) 1: 2: 3: 4: 5: 6: 7: {Initialization} 1 T F S(X1 |) 1 T F S(X1 |X2 ) 1 T F S(X1 |) {Dynamic Programming} for k = 2 to n do k-1 k-1 F S(Xk |Xk-1 ) + T F S(Xk |Xk-1 ) + T k-1 k-1 k k-1 F S(Xk |) + T F S(Xk |) + T T max , B arg max k-1 k-1 F S(Xk |) + T F S(Xk |) + T 4 8: 9: 10: 11: 12: 13: 14: 15: 16: k-1 k-1 F S(Xk |Xk-1 Xk+1 ) + T F S(Xk |Xk-1 Xk+1 ) + T k-1 k-1 k-1 k F S(Xk |Xk+1 ) + T F S(Xk |Xk+1 ) + T , B arg max T max k-1 k-1 F S(Xk |Xk+1 ) + T F S(Xk |Xk+1 ) + T k-1 k-1 F S(Xk |Xk-1 ) + T F S(Xk |Xk-1 ) + T k-1 k-1 k-1 k F S(Xk |) + T F S(Xk |) + T , B arg max T max k-1 k-1 F S(Xk |) + T F S(Xk |) + T end for {Backtracking} QN arg max T N for k = N - 1 to 1 do k Qk BQN end for return Q 5 ĩ PyyP zdfGzߨޡ rrWEdEy v p PpUr rhy rrGf)vzw ~} | { y x v ut s q p om l k j i hgf d vrvfnrGPffEe PvpxyvtrpghfdabPYWUERIPGEC w u s q i ec `ctor (every variable has a corresponding CPT P (X|P a(X)), maybe with P a(X) ; and from the construction in Problem 7.1, inclusion of this CPT means X A) which is not possible from the procedural definition of variable elimination. Therefore, every intermediate factor of variable elimination corresponds to a conditional probability in some network. 7 Tree-Augmented Na Bayes ive The structure of the TAN network using all the data is in figure 1. 14 10 1 4 2 3 8 5 7 9 6 Figure 1: The tree-augmented na Bayes network for breast.csv. The attributes are described in the data ive tarball. Comparison of classification error vs. number of training samples averaged over 50 iterations 0.046 TAN Naive Bayes 0.044 0.042 Classification error 0.04 0.038 0.036 0.034 0.032 100 150 200 250 300 350 Size of training data 400 450 500 Figure 2: On this data set, na Bayes does better than tree-augmented na Bayes. ive ive 15 ...
View Full Document

This note was uploaded on 05/25/2008 for the course MACHINE LE 10708 taught by Professor Carlosgustin during the Fall '07 term at Carnegie Mellon.

Ask a homework question - tutors are online