Thus we have y 1 y n inverse wishart 0 n s 0 s 1 79

This preview shows page 118 - 120 out of 270 pages.

Thus we have { | y 1 , . . . , y n , } inverse-Wishart( 0 + n, [ S 0 + S ] - 1 ) . (7.9) Hopefully this result seems somewhat intuitive: We can think of 0 + n as the “posterior sample size,” being the sum of the “prior sample size” 0 and the data sample size. Similarly, S 0 + S can be thought of as the “prior” residual sum of squares plus the residual sum of squares from the data. Additionally, the conditional expectation of the population covariance matrix is E[ | y 1 , . . . , y n , ] = 1 0 + n - p - 1 ( S 0 + S ) = 0 - p - 1 0 + n - p - 1 1 0 - p - 1 S 0 + n 0 + n - p - 1 1 n S and so the conditional expectation can be seen as a weighted average of the prior expectation and the unbiased estimator. Because it can be shown that S converges to the true population covariance matrix, the posterior expectation of is a consistent estimator of the population covariance, even if the true population distribution is not multivariate normal.
112 7 The multivariate normal model 7.4 Gibbs sampling of the mean and covariance In the last two sections we showed that { | y 1 , . . . , y n , } multivariate normal( μ n , n ) { | y 1 , . . . , y n , } inverse-Wishart( n , S - 1 n ) , where { n , μ n } are defined in Equations 7.4 and 7.5, n = 0 + n and S n = S 0 + S . These full conditional distributions can be used to construct a Gibbs sampler, providing us with an MCMC approximation to the joint posterior distribution p ( , | y 1 , . . . , y n ). Given a starting value (0) , the Gibbs sampler generates { ( s +1) , ( s +1) } from { ( s ) , ( s ) } via the following two steps: 1. Sample ( s +1) from its full conditional distribution: a) compute μ n and n from y 1 , . . . , y n and ( s ) ; b) sample ( s +1) multivariate normal( μ n , n ). 2. Sample ( s +1) from its full conditional distribution: a) compute S n from y 1 , . . . , y n and ( s +1) ; b) sample ( s +1) inverse-Wishart( 0 + n, S - 1 n ). Steps 1.a and 2.a highlight the fact that { μ n , n } depend on the value of , and that S n depends on the value of , and so these quantities need to be recalculated at every iteration of the sampler. Example: Reading comprehension Let’s return to the example from the beginning of the chapter in which each of 22 children were given two reading comprehension exams, one before a certain type of instruction and one after. We’ll model these 22 pairs of scores as i.i.d. samples from a multivariate normal distribution. The exam was designed to give average scores of around 50 out of 100, so μ 0 = (50 , 50) T would be a good choice for our prior expectation. Since the true mean cannot be below 0 or above 100, it is desirable to use a prior variance for that puts little probability outside of this range. We’ll take the prior variances on 1 and 2 to be λ 2 0 , 1 = λ 2 0 , 2 = (50 / 2) 2 = 625, so that the prior probability Pr( j 62 [0 , 100]) is only 0.05. Finally, since the two exams are measuring similar things, whatever the true values of 1 and 2 are it is probable that they are close. We can reflect this with a prior correlation of 0 . 5, so that λ 1 , 2 = 312 . 5. As for the prior distribution on , some of the same logic about the range of exam scores applies. We’ll take S 0 to be the same as 0 , but only loosely center around this value by taking 0 = p + 2 = 4.

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture