This preview shows page 1. Sign up to view the full content.
Unformatted text preview: 10708 Graphical Models: Homework 2 Solutions 1 Iequivalence
1.1
We want to show that two graphs G1 and G2 are Iequivalent if 1) they have the same trails, and 2) a trail is active in G1 iff it is active in G2 . It is easy to see that two graphs have the same set of trails iff they have the same skeleton. Also, two graphs are Iequivalent if I(G1 ) = I(G2 ) , by definition. Thus, we assume that G1 and G2 have the same skeleton and the same set of active trails, and for the sake of contradiction, we assume that they are not Iequivalent. This means they have different independence assertions. This can come about if G1 and G2 have different skeletons, but this violates our first assumption. If we assume that they have the same skeletons, then the only way they can have different independence assertions is if two variables (u, v) are dseparated in one graph (say, G1 ) but are dependent in the other (G2 ), under the same evidence set E. However, if this is the case, then there are no active trails between u and v in G1 , but at least one in G2 , violating our second assumption. Therefore, by contradiction, G1 and G2 must be Iequivalent. 1.2
In 1.1, we showed that if G1 and G2 have the same skeleton and the same active trails, then they must be Iequivalent. We now want to show that if G1 and G2 have the same skeleton and the same vstructures, then they are Iequivalent. Assume G1 and G2 have the same skeleton and vstructures, and assume (u ; v) is an active trail in G1 given some evidence set E. In order for (u ; v) to be active, E cannot contain any variables xi such that xi1 xi xi+1 , xi1 xi xi+1 , or xi1 xi xi+1 exist on the active trail. Likewise, for any vstructure on this trail with center node xi , either xi or its descendents must exist in E. Since G2 has the same vstructures and skeleton, the trail (u ; v) in G2 contains exactly the same vstructures it did in G1 , and furthermore, these vstructures have the same set of descendents. Thus, since the two graphs have the same evidence set, the trail is active in G1 iff it is active in G2 . At this point, by 1.1, we complete the proof, showing that G1 and G2 must be Iequivalent. 1 2 Decomposable Scores
2.1
Let G be a network structure, and score be a decomposable score. 2.1.1 Assume o is "Add X Y ", and X Y G / G (o) = score(o(G) : D)  score(G : D) =
i F amScore(Zi P ai o(G) : D) 
i F amScore(Zi P aG : D) i We can pull out all of the family scores except for the ones for node Y : =
Z=Y F amScore(ZP ai o(G) : D)  F amScore(ZP aG : D)+F amScore(Y P aY i o(G) : D)F amScore(Y P aG : D) Y These disappear because only Y 's parents changed. The other family scores are identical. = F amScore(Y P aY
o(G) : D)  F amScore(Y P aG : D) Y Rewriting these family scores in terms of what changed: = F amScore(Y, P aG {X} : D)  F amScore(Y, P aG : D) Y Y Thus we conclude that if o is "Add X Y ", and X Y G, then / G (o) = F amScore(Y, P aG {X} : D)  F amScore(Y, P aG : D) Y Y 2.1.2 Assume o is "Delete X Y ", and X Y G.By reasoning from 2.1.1: G (o) = F amScore(Y P aY
o(G) : D)  F amScore(Y P aG : D) Y Rewriting these family scores in terms of what changed: = F amScore(Y, P aG  {X} : D)  F amScore(Y, P aG : D) Y Y Thus we conclude that if o is "Delete X Y ", and X Y G, then G (o) = F amScore(Y, P aG  {X} : D)  F amScore(Y, P aG : D) Y Y 2.1.3 Assume o is "Reverse X Y ", and X Y G G (o) = score(o(G) : D)  score(G : D) =
i F amScore(Zi P ai o(G) : D) 
i F amScore(Zi P aG : D) i 2 We can pull out all of the family scores except for the ones for node Y and node X: = ZG\(XY ) F amScore(ZP ai
o(G) o(G) : D)  F amScore(ZP aG : D) + i
o(G) F amScore(Y P aY = F amScore(Y P aY : D)  F amScore(Y P aG : D) + F amScore(XP aX Y : D)  F amScore(Y P aG : D) + F amScore(XP aX Y : D)  F amScore(XP aG : D) X : D)  F amScore(XP aG : D) X o(G) o(G) Rewriting these family scores in terms of what changed: = F amScore(X, P aG {Y } : D) + F amScore(Y, P aG  {X} : D)  X Y F amScore(X, P aG : D)  F amScore(Y, P aG : D) X Y Thus we conclude that if o is "Reverse X Y ", and X Y G, then G (o) = F amScore(X, P aG {Y } : D) + F amScore(Y, P aG  {X} : D)  X Y F amScore(X, P aG : D)  F amScore(Y, P aG : D) X Y 2.2
Let G and G be two network structures, and score be a decomposable score. 2.2.1 Assume o is "Add X Y " and P aG = P aG . Y Y Using Proposition 15.4.5: G (o) = F amScore(Y, P aG {X} : D)  F amScore(Y, P aG : D) Y Y = F amScore(Y, P aG {X} : D)  F amScore(Y, P aG : D) Y Y = G (o) Assume o is "Delete X Y " and P aG = P aG . Y Y Using Proposition 15.4.5: G (o) = F amScore(Y, P aG {X} : D)  F amScore(Y, P aG : D) Y Y = F amScore(Y, P aG  {X} : D)  F amScore(Y, P aG : D) Y Y = G (o) We conclude that if o is either "Add X Y " or "Delete X Y " and P aG = P aG , then G (o) = G (o). Y Y 2.2.2 Assume o is "Reverse X Y ", P aG = P aG , and P aG = P aG . Y Y X X 3 G (o) = F amScore(X, P aG {Y } : D) + F amScore(Y, P aG  {X} : D)  X Y F amScore(X, P aG : D)  F amScore(Y, P aG : D) X Y = F amScore(X, P aG {Y } : D) + F amScore(Y, P aG  {X} : D)  X Y F amScore(X, P aG : D)  F amScore(Y, P aG : D) X Y = G (o) We conclude that if o is "Reverse X Y ", P aG = P aG , and P aG = P aG then G (o) = G (o). Y Y X X 3 Learning Edge Directions
1. I'll use dotted lines to denote a deleted edge. The possible BNs on the skeleton X1  X2  X3 and family scores are: (a) X1 X2 X3 : (b) X1 X2 X3 : (c) X1 X2 X3 : (d) X1 X2 X3 : (e) X1 X2 X3 : (f) X1 X2 X3 : (g) X1 X2 X3 : (h) X1 X2 X3 : (i) X1 X2 X3 : F S(X1 ) + F S(X2 X1 ) + F S(X3 X2 ) F S(X1 ) + F S(X2 X1 X3 ) + F S(X3 ) F S(X1 ) + F S(X2 X1 ) + F S(X3 ) F S(X1 X2 ) + F S(X2 ) + F S(X3 X2 ) F S(X1 X2 ) + F S(X2 X3 ) + F S(X3 ) F S(X1 X2 ) + F S(X2 ) + F S(X3 ) F S(X1 ) + F S(X2 ) + F S(X3 X2 ) F S(X1 ) + F S(X2 X3 ) + F S(X3 ) F S(X1 ) + F S(X2 ) + F S(X3 ) 2. Now if we have the skeleton X1  X2  X3  X4 , the decision about the edge X1  X2 does not affect the family score of X3 because the family score of X3 depends only on the decisions of the edges between itself and X2 and X4 . 3. There is a linear time dynamic programming algorithm for finding the optimal BN from a chain skeleton X1 X2 X3  Xn . To find this algorithm involves recasting the question of finding the optimal BN in a recursive way. Let k be a number such that 1 k < n. Given the direction of the edge between Xk and Xk+1 , we would like to know the highest scoring structure for X1  X2    Xk and its score. The algorithm will build a 3 n table T which holds these scores (If Xk Xk+1 , we will denote the score k of the optimal structure up to k by the table element T ). As the following pseudocode shows, column T k can be built using only column Tk1 and local scores as enumerated in part (1). (I will also use a table of k pointers B for a backtracking step  each B// will hold three possible values: , or .)
1: 2: 3: 4: 5: 6: 7: {Initialization} 1 T F S(X1 ) 1 T F S(X1 X2 ) 1 T F S(X1 ) {Dynamic Programming} for k = 2 to n do k1 k1 F S(Xk Xk1 ) + T F S(Xk Xk1 ) + T k1 k1 k k1 F S(Xk ) + T F S(Xk ) + T T max , B arg max k1 k1 F S(Xk ) + T F S(Xk ) + T 4 8: 9: 10: 11: 12: 13: 14: 15: 16: k1 k1 F S(Xk Xk1 Xk+1 ) + T F S(Xk Xk1 Xk+1 ) + T k1 k1 k1 k F S(Xk Xk+1 ) + T F S(Xk Xk+1 ) + T , B arg max T max k1 k1 F S(Xk Xk+1 ) + T F S(Xk Xk+1 ) + T k1 k1 F S(Xk Xk1 ) + T F S(Xk Xk1 ) + T k1 k1 k1 k F S(Xk ) + T F S(Xk ) + T , B arg max T max k1 k1 F S(Xk ) + T F S(Xk ) + T end for {Backtracking} QN arg max T N for k = N  1 to 1 do k Qk BQN end for return Q 5 ĩ PyyP zdfGzߨޡ rrWEdEy v p PpUr rhy rrGf)vzw ~}  { y x v ut s q p om l k j i hgf d vrvfnrGPffEe PvpxyvtrpghfdabPYWUERIPGEC w u s q i ec `ctor (every variable has a corresponding CPT P (XP a(X)), maybe with P a(X) ; and from the construction in Problem 7.1, inclusion of this CPT means X A) which is not possible from the procedural definition of variable elimination. Therefore, every intermediate factor of variable elimination corresponds to a conditional probability in some network. 7 TreeAugmented Na Bayes ive
The structure of the TAN network using all the data is in figure 1. 14 10 1 4 2 3 8 5 7 9 6 Figure 1: The treeaugmented na Bayes network for breast.csv. The attributes are described in the data ive tarball. Comparison of classification error vs. number of training samples averaged over 50 iterations 0.046 TAN Naive Bayes 0.044 0.042 Classification error 0.04 0.038 0.036 0.034 0.032 100 150 200 250 300 350 Size of training data 400 450 500 Figure 2: On this data set, na Bayes does better than treeaugmented na Bayes. ive ive 15 ...
View
Full
Document
This note was uploaded on 05/25/2008 for the course MACHINE LE 10708 taught by Professor Carlosgustin during the Fall '07 term at Carnegie Mellon.
 Fall '07
 CarlosGustin

Click to edit the document details