This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Page 1 Chapter 12 Multivariate normal distributions
The multivariate normal is the most useful, and most studied, of the standard joint distributions in probability. A huge body of statistical theory depends on the properties of families of random variables whose joint distribution is at least approximately multivariate normal. The bivariate case (two variables) is the easiest to understand, because it requires a
minimum of notation; vector notation and matrix algebra becomes necessities when many
random variables are involved.
The general bivariate normal is often used to model pairs of dependent random variables, such as : the height and weight of an individual; or (as an approximation) the score a
student gets on a ﬁnal exam and the total score she gets on the problem sets; or the heights
of father and son; and so on. Many fancy statistical procedures implicitly require bivariate
(or multivariate, for more than two random variables) normality.
Bivariate normal
The most general bivariate normal can be built from a pair of independent random variables, X and Y , each distributed N (0, 1). For a constant ρ with −1 < ρ < 1, deﬁne random
variables
U=X
and
V = ρ X + 1 − ρ2 Y
That is,
(U , V ) = ( X , Y ) A where A = 1
0 ρ
1 − ρ2 Notice that EU = EV = 0,
var(V ) = ρ 2 var( X ) + (1 − ρ 2 )var(Y ) = 1 = var(U ),
and
cov(U , V ) = ρ cov( X , X ) + 1 − ρ 2 cov( X , Y ) = ρ. Consequently,
correlation(U , V ) = cov(U , V )/ var(U )var(V ) = ρ
From Chapter 10, the joint density for (U , V ) is
1
f (u , v) A−1 ,
 det A
where
x 2 + y2
1
exp −
f (x , y ) =
2π
2
The matrix A has determinant 1 − ρ 2 and inverse A−1 =
Statistics 241: 16 November 1997 all x , y 1 − ρ2
0 −ρ
1 / 1 − ρ2
c David Pollard Chapter 12 Multivariate normal distributions If (x , y ) = (u , v) A−1 then
x 2 + y 2 = (u , v) A−1 ( A−1 ) (u , v)
1 −ρ
= (u , v)
(u , v) /(1 − ρ 2 )
−ρ
0
= u 2 − 2ρ u v + v 2
1 − ρ2 Thus U and V have joint density
1 •standard bivariate normal <12.1> exp − u 2 − 2ρ u v + v 2
2(1 − ρ 2 ) for all u , v.
2π 1 − ρ 2
The joint distribution is sometimes called the standard bivariate normal distribution
with correlation ρ .
The symmetry of ψ in u and v implies that V has the same marginal distribution as U ,
that is, V is also N (0, 1) distributed. The calculation of the marginals densities involves the
same integration for both variables.
When ρ equals zero, the joint density factorizes into
1
1
√ exp(−u 2 /2) √ exp(−v 2 /2)
2π
2π
which implies independence of U and V . That is, for random variables with a bivariate normal distribution, zero correlation is equivalent to independence. The equivalence for bivariate normals probably accounts for the widespread confusion between the properties of independence and zero correlation. In general, independence implies zero correlation, but not
conversely.
Deﬁnition. Random variables S and T are said to have a bivariate normal distribution,
2
2
with parameters E S = µ S , ET = µT , var( S ) = σ S , var(T ) = σT , and correlation ρ if
the standardized random variables ( S − µ S )/σ S and (T − µT )/σT have a standard bivariate
normal distribution with correlation ρ .
Problem 11.1 shows how to calculate explicitly the joint density for S and T .
Conditional distributions
The construction of U and V from the independent X and Y makes the calculation of the
conditional distribution of V given U = u a triviality:
ρX +
has the distribution of ρ x + <12.2> 1 − ρ2 Y  X = x 1 − ρ 2 N (0, 1). That is,
V  U = u ∼ N (ρ u , 1 − ρ 2 ) The symmetry of the joint distribution of U and V implies that
U  V = v ∼ N (ρv, 1 − ρ 2 ),
a fact that you could check by explicit calculation of the ratio of joint to marginal densities:
ψ(u , v)
<12.3> ∞
−∞ ψ(u , v) du = √ 1
2π 1 − ρ 2 exp − (v − ρ u )2
2(1 − ρ 2 ) Example. Let X denote the height (in inches) of a randomly chosen father, and let Y denote the height (in inches) of his son at maturity. Suppose each of X and Y has a N (µ, σ 2 )
distribution with µ = 69 and σ = 2. Suppose also that X and Y have a bivariate normal
distribution with correlation ρ = .3.
If Sam has a height of 74 inches, what would one predict about the ultimate height of
his son Elmer? Statistics 241: 16 November 1997 c David Pollard Page 2 Chapter 12 Multivariate normal distributions In standardized units,
U = ( X − µ)/σ = Sam’s standardized height, which happens to equal 2.5
V = (Y − µ)/σ = Elmer’s standardized ultimate height.
By assumption, before the value of U was known, the pair (U , V ) has a standard bivariate
normal distribution with correlation ρ . From the analog of formula <12.2>,
V  U = 2.5 ∼ N (2.5ρ, 1 − ρ 2 )
In the original units,
Elmer’s height  Sam’s height = 74 inches ∼ N (µ + 2.5ρσ, (1 − ρ 2 )σ 2 ) = N (70.5, 3.64)
Notice that Elmer’s expected height (given that Sam is 74 inches) is less than his father’s height. This fact is an example of a general phenomenon called “regression towards
the mean”. The term regression, as a synonym for conditional expectation, has become
commonplace in Statistics. •regression Multivariate densities
Random variables X 1 , X 2 , . . . are said to have a jointly continuous distribution with joint
density function f (x1 , x2 , . . . , xn ) if
P{( X 1 , X 2 , . . . , X n ) ∈ A} = •random ... {(x1 , x2 , . . . xn ) ∈ A} f (x1 , x2 , . . . , xn ) d x1 d x2 . . . d xn for each subset A of Rn . The density f must be nonnegative and integrate to 1 over Rn .
It is convenient to write X for the random vector ( X 1 , . . . , X n ), and x for the
generic point (x1 , . . . , xn ) in Rn . Then the deﬁning property for the joint density becomes vector P{X ∈ A} =
where
<12.4> {x ∈ A} f (x) d x for A ⊆ Rn . . . d x should be understood as an n fold integral. Example. If the random variables X 1 , . . . , X n are independent, the joint density function
is equal to the product of the marginal densities for each X i , and conversely. The proof is
similar to the proof for the bivariate case.
For example, if the { X i } are independent and each X i has a N (0, 1) distribution, the
joint density is
f (x1 , . . . , xn ) = 1
exp −
xi2 /2
(2π)n /2
i ≤n for all x1 , . . . , xn 1
exp(− x 2 /2)
for all x
(2π)n /2
The distribution is denoted by N (0, In ). It is sometimes called the “spherical normal distribution”, because of the spherical symmetry of the density.
= The methods for ﬁnding joint densities for random variables deﬁned as functions of
other random variables with jointly continuous distributions—as explained over the last two
Chapters—extend to multivariate distributions. There is a problem with the drawing of n dimensional pictures, to keep track of the transformations, and one must remember to say
“n dimensional volume” instead of area, but otherwise calculations are not much more complicated than in two dimensions.
Rotation of coordinate axes
The spherical symmetry of the density f (·) is responsible for an important property of multivariate normals. Let q1 , . . . , qn be a new orthonormal basis for Rn , and let
Z = W1 q1 + . . . + Wn qn
Statistics 241: 16 November 1997 c David Pollard Page 3 Chapter 12 Multivariate normal distributions be the representation for Z in the new basis.
<12.5> Theorem. The W1 , . . . , Wn are also independent N (0, 1) distributed random variables.
In two dimensions, the assertion follows from the transformation formulae of Chapter 10. If the axes are rorated through an angle θ , then
W1 = Z 1 cos(θ ) + Z 2 sin(θ )
W2 = − Z 1 sin(θ ) + Z 2 cos(θ )
That is,
( W1 , W2 ) = ( Z 1 , Z 2 ) A θ cos(θ ) − sin(θ )
sin(θ ) cos(θ ) where Aθ = The matrix Aθ has determinant 1 and inverse A−θ . It is an orthogonal matrix; it preserves
lengths. The joint density of (W1 , W2 ) is
1
1
2
2
exp − (w1 , w2 ) A−1 2 /2 =
exp −(w1 + w2 )/2
2π
2π
ball B (in Zcoordinates) = ball B* (in Wcoordinates) z2 w1 w2 z1 A more intuitive explanation is based on the approximation
P{Z ∈ B } ≈ f (z)(volume of B )
for a small ball B centered at z. The transformation from Z to W corresponds to a rotation,
so
P{Z ∈ B } = P{W ∈ B ∗ },
where B ∗ is a ball of the same radius, but centered at the point w = (w1 , . . . , wn ) for which
w1 q1 + . . . + wn qn = z. The last equality implies w = z , from which we get
P{W ∈ B ∗ } ≈ (2π )−n /2 exp(− 1 w 2 )(volume of B ∗ ).
2
That is, W has the asserted spherical normal density.
<12.6>
•chisquare Deﬁnition. Let Z = ( Z 1 , Z 2 , . . . , Z n ) have a spherical normal distribution N (0, In ). The
2
2
2
chisquare, χn , is deﬁned as the distribution of Z 2 = Z 1 + . . . + Z n .
To prove results about the spherical normal it is often merely a matter of transforming
to an appropriate orthonormal basis. <12.7> Exercise. Suppose Z 1 , Z 2 , . . . , Z n are independent, each distributed N (0, 1). Deﬁne
Z1 + . . . + Zn
¯
¯
and
T=
( Z i − Z )2
Z=
n
i ≤n 2
¯
Show that Z has a N (0, 1/ n ) distribution independently of T , which has a χn −1 distribution. Statistics 241: 16 November 1997 c David Pollard Page 4 Chapter 12 Multivariate normal distributions √
Solution: Choose the new orthonormal basis with q1 = (1, 1, . . . , 1) / n . Choose
q2 , . . . , qn however you like, provided they are orthogonal unit vectors, all orthogonal to
q1 . In the new coordinate system,
Z = W1 q1 + . . . + Wn qn
We could calculate each Wi by dotting the sum on the right hand side with qi : only Wi
would survive. In particular,
√
Z1 + . . . + Zn
¯
= nZ.
W 1 = Z · q1 =
√
n
¯
From Theorem <12.5> we know that W1 has a N (0, 1) distribution. It follows that Z has a
N (0, 1/ n ) distribution.
The random variable T equals the squared length of the vector
√
¯
¯
¯
( Z 1 − Z , . . . , Z n − Z ) = Z − Z ( n q1 ) = Z − W1 q1 = W2 q2 + . . . + Wn qn
That is,
T = W2 q2 + . . . + Wn qn 2 2
2
= W2 + . . . + Wn , 2
a sum of squares of n − 1 independent N (), 1) random variables, which has a χn −1 distribution.
¯
Finally, notice that Z is a function of W1 , whereas T is a function of the independent
¯
random variables W2 , . . . , Wn . The independence of Z and T follows. Exercise for the reader: Suppose X 1 , . . . X n are independent, each distributed N (µ, σ 2 ).
¯
Apply the results from the last Exercise, with Z i = ( X i − µ)/σ , to deduce that X is distributed N (µ, σ 2 / n ) independently of
¯
( X i − X )2 /σ 2 ,
i ≤n which has a 2
χn −1 distribution. Statistics 241: 16 November 1997 c David Pollard Page 5 ...
View
Full
Document
This note was uploaded on 11/17/2011 for the course STOR 664 taught by Professor Staff during the Fall '11 term at UNC.
 Fall '11
 Staff

Click to edit the document details