From our observed data we want to obtain
p
(
✓
,
⌃
,
Y
miss

Y
obs
), the poste
rior distribution of unknown and unobserved quantities. A Gibbs sampling
scheme for approximating this posterior distribution can be constructed by
simply adding one step to the Gibbs sampler presented in the previous sec
tion: Given starting values
{
⌃
(0)
,
Y
(0)
miss
}
, we generate
{
✓
(
s
+1)
,
⌃
(
s
+1)
,
Y
(
s
+1)
miss
}
from
{
✓
(
s
)
,
⌃
(
s
)
,
Y
(
s
)
miss
}
by
1. sampling
✓
(
s
+1)
from
p
(
✓

Y
obs
,
Y
(
s
)
miss
,
⌃
(
s
)
) ;
2. sampling
⌃
(
s
+1)
from
p
(
⌃

Y
obs
,
Y
(
s
)
miss
,
✓
(
s
+1)
) ;
3. sampling
Y
(
s
+1)
miss
from
p
(
Y
miss

Y
obs
,
✓
(
s
+1)
,
⌃
(
s
+1)
).
Note that in steps 1 and 2, the fixed value of
Y
obs
combines with the current
value of
Y
(
s
)
miss
to form a current version of a complete data matrix
Y
(
s
)
having
118
7 The multivariate normal model
no missing values. The
n
rows of the matrix of
Y
(
s
)
can then be plugged into
formulae 7.6 and 7.9 to obtain the full conditional distributions of
✓
and
⌃
.
Step 3 is a bit more complicated:
p
(
Y
miss

Y
obs
,
✓
,
⌃
)
/
p
(
Y
miss
,
Y
obs

✓
,
⌃
)
=
n
Y
i
=1
p
(
y
i,
miss
,
y
i,
obs

✓
,
⌃
)
/
n
Y
i
=1
p
(
y
i,
miss

y
i,
obs
,
✓
,
⌃
)
,
so for each
i
we need to sample the missing elements of the data vector condi
tional on the observed elements. This is made possible via the following result
about multivariate normal distributions: Let
y
⇠
multivariate normal(
✓
,
⌃
),
let
a
be a subset of variable indices
{
1
, . . . , p
}
and let
b
be the complement of
a
. For example, if
p
= 4 then perhaps
a
=
{
1
,
2
}
and
b
=
{
3
,
4
}
. If you know
about inverses of partitioned matrices you can show that
{
y
[
b
]

y
[
a
]
,
✓
,
⌃
}
⇠
multivariate normal(
✓
b

a
,
⌃
b

a
)
,
where
✓
b

a
=
✓
[
b
]
+
⌃
[
b,a
]
(
⌃
[
a,a
]
)

1
(
y
[
a
]

✓
[
a
]
)
(7.10)
⌃
b

a
=
⌃
[
b,b
]

⌃
[
b,a
]
(
⌃
[
a,a
]
)

1
⌃
[
a,b
]
.
(7.11)
In the above formulae,
✓
[
b
]
refers to the elements of
✓
corresponding to the
indices in
b
, and
⌃
[
a,b
]
refers to the matrix made up of the elements that are
in rows
a
and columns
b
of
⌃
.
Let’s try to gain a little bit of intuition about what is going on in Equations
7.10 and 7.11. Suppose
y
is a sample from our population of four variables
glu
,
bp
,
skin
and
bmi
. If we have
glu
and
bp
data for someone (
a
=
{
1
,
2
}
) but are
missing
skin
and
bmi
measurements (
b
=
{
3
,
4
}
), then we would be interested
in the conditional distribution of these missing measurements
y
[
b
]
given the
observed information
y
[
a
]
. Equation 7.10 says that the conditional mean of
skin
and
bmi
start o
↵
at their unconditional mean
✓
[
b
]
, but then are modified
by (
y
[
a
]

✓
[
a
]
). For example, if a person had higher than average values of
glu
and
bp
, then (
y
[
a
]

✓
[
a
]
) would be a 2
⇥
1 vector of positive numbers. For our
data the 2
⇥
2 matrix
⌃
[
b,a
]
(
⌃
[
a,a
]
)

1
has all positive entries, and so
✓
b

a
>
✓
[
b
]
.
This makes sense: If all four variables are positively correlated, then if we
observe higher than average values of
glu
and
bp
, we should also expect
higher than average values of
skin
and
bmi
. Also note that
⌃
b

a
is equal to
the unconditional variance
⌃
[
b,b
]
but with something subtracted o
↵
, suggesting
that the conditional variance is less than the unconditional variance. Again,
this makes sense: having information about some variables should decrease,
or at least not increase, our uncertainty about the others.