This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS228 Problem Set #2 1 CS 228, Winter 2009 Problem Set #2 Solutions For each problem, a number of error codes describing common mistakes made by students are listed below. If you feel that your homework has been wrongly graded, please come see us. All error codes are tentative in this handout. They will go through modifications as we grade the homeworks. 1. [5 points] MAP Dirichlet Suppose that a prior on a parameter vector is p ( θ ) ∼ Dirichlet ( α 1 ,...,α k ). What is the MAP value of the parameters, that is argmax θ p ( θ  D )? Assum M 1 ,M 2 ,...,M k are the sufficient statistics from the data set D . Answer: M k + α k − 1 M + α − k ,where α = k summationdisplay i =1 α i and M = k summationdisplay i =1 M i Error codes: (1.1) [3 points] Misunderstood the question as the MLE problem. (1.2) [1 points] didn’t get that the posterior is also a dirichlet. (1.3) [1 points] Minor error. 2. [12 points] Search in Structure Learning 1, 2, 3, 4 2, 1, 3, 4 1, 3, 2, 4 1, 2, 4, 3 2, 3, 1, 4 1, 2, 3, 4 2, 1, 4, 2 Figure 1: Partial search tree example for orderings over variables X 1 ,X 2 ,X 3 ,X 4 . Successors to ≺ = (1 , 2 , 3 , 4) and ≺ ′ = (2 , 1 , 3 , 4) shown. Consider learning the structure of a Bayesian network for some given ordering, ≺ , of the variables, X 1 ,...,X n . This can be done efficiently as described in Section 14.3.2 of the course reader. Now assume that we want to perform search over the space of orderings, i.e. we are searching for the network (with bounded indegree k ) that has the highest score. We do this by defining the score of an ordering as the score of the (bounded indegree) network with the maximum score consistent with that ordering, and then searching for the ordering with the highest score. We bound the indegree so that we have a smaller and smoother search space. CS228 Problem Set #2 2 We will define our search operator, o , to be “Swap X i and X i +1 ” for some i = 1 ,...,n − 1. Starting from some given ordering, ≺ , we evaluate the BICscore of all successor orderings, ≺ ′ , where a successor ordering is found by applying o to ≺ (see Figure 1). We now choose a particular successor, ≺ ′ . Provide an algorithm for computing as efficiently as possible the BICscore for the successors of the new ordering, ≺ ′ , given that we have already computed the scores for successors of ≺ . Answer: • Notation: Let h be the variable that was swapped for ≺ ′ , let i be the variable to be swapped for candidate ≺ ′′ , and let Succ j ( ≺ ) be the j th candidate successor of ≺ . • The BICscore is decomposable so we only need to consider the family scores. • We cache all family scores of ≺ as well as all family scores of all candidates for ≺ ′ ....
View
Full
Document
This document was uploaded on 03/03/2011.
 Winter '09

Click to edit the document details