This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CS228 Problem Set #2 1 CS 228, Winter 2009 Problem Set #2 Solutions For each problem, a number of error codes describing common mistakes made by students are listed below. If you feel that your homework has been wrongly graded, please come see us. All error codes are tentative in this handout. They will go through modifications as we grade the homeworks. 1. [5 points] MAP Dirichlet Suppose that a prior on a parameter vector is p ( θ ) ∼ Dirichlet ( α 1 ,...,α k ). What is the MAP value of the parameters, that is argmax θ p ( θ | D )? Assum M 1 ,M 2 ,...,M k are the sufficient statistics from the data set D . Answer: M k + α k − 1 M + α − k ,where α = k summationdisplay i =1 α i and M = k summationdisplay i =1 M i Error codes: (1.1) [3 points] Mis-understood the question as the MLE problem. (1.2) [1 points] didn’t get that the posterior is also a dirichlet. (1.3) [1 points] Minor error. 2. [12 points] Search in Structure Learning 1, 2, 3, 4 2, 1, 3, 4 1, 3, 2, 4 1, 2, 4, 3 2, 3, 1, 4 1, 2, 3, 4 2, 1, 4, 2 Figure 1: Partial search tree example for orderings over variables X 1 ,X 2 ,X 3 ,X 4 . Successors to ≺ = (1 , 2 , 3 , 4) and ≺ ′ = (2 , 1 , 3 , 4) shown. Consider learning the structure of a Bayesian network for some given ordering, ≺ , of the variables, X 1 ,...,X n . This can be done efficiently as described in Section 14.3.2 of the course reader. Now assume that we want to perform search over the space of orderings, i.e. we are searching for the network (with bounded in-degree k ) that has the highest score. We do this by defining the score of an ordering as the score of the (bounded in-degree) network with the maximum score consistent with that ordering, and then searching for the ordering with the highest score. We bound the in-degree so that we have a smaller and smoother search space. CS228 Problem Set #2 2 We will define our search operator, o , to be “Swap X i and X i +1 ” for some i = 1 ,...,n − 1. Starting from some given ordering, ≺ , we evaluate the BIC-score of all successor orderings, ≺ ′ , where a successor ordering is found by applying o to ≺ (see Figure 1). We now choose a particular successor, ≺ ′ . Provide an algorithm for computing as efficiently as possible the BIC-score for the successors of the new ordering, ≺ ′ , given that we have already computed the scores for successors of ≺ . Answer: • Notation: Let h be the variable that was swapped for ≺ ′ , let i be the variable to be swapped for candidate ≺ ′′ , and let Succ j ( ≺ ) be the j th candidate successor of ≺ . • The BIC-score is decomposable so we only need to consider the family scores. • We cache all family scores of ≺ as well as all family scores of all candidates for ≺ ′ ....
View Full Document
This document was uploaded on 03/03/2011.
- Winter '09