Multinormal

Multinormal - Page 1 Chapter 12 Multivariate normal...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Page 1 Chapter 12 Multivariate normal distributions The multivariate normal is the most useful, and most studied, of the standard joint distributions in probability. A huge body of statistical theory depends on the properties of families of random variables whose joint distribution is at least approximately multivariate normal. The bivariate case (two variables) is the easiest to understand, because it requires a minimum of notation; vector notation and matrix algebra becomes necessities when many random variables are involved. The general bivariate normal is often used to model pairs of dependent random variables, such as : the height and weight of an individual; or (as an approximation) the score a student gets on a final exam and the total score she gets on the problem sets; or the heights of father and son; and so on. Many fancy statistical procedures implicitly require bivariate (or multivariate, for more than two random variables) normality. Bivariate normal The most general bivariate normal can be built from a pair of independent random variables, X and Y , each distributed N (0, 1). For a constant ρ with −1 < ρ < 1, define random variables U=X and V = ρ X + 1 − ρ2 Y That is, (U , V ) = ( X , Y ) A where A = 1 0 ρ 1 − ρ2 Notice that EU = EV = 0, var(V ) = ρ 2 var( X ) + (1 − ρ 2 )var(Y ) = 1 = var(U ), and cov(U , V ) = ρ cov( X , X ) + 1 − ρ 2 cov( X , Y ) = ρ. Consequently, correlation(U , V ) = cov(U , V )/ var(U )var(V ) = ρ From Chapter 10, the joint density for (U , V ) is 1 f (u , v) A−1 , | det A| where x 2 + y2 1 exp − f (x , y ) = 2π 2 The matrix A has determinant 1 − ρ 2 and inverse A−1 = Statistics 241: 16 November 1997 all x , y 1 − ρ2 0 −ρ 1 / 1 − ρ2 c David Pollard Chapter 12 Multivariate normal distributions If (x , y ) = (u , v) A−1 then x 2 + y 2 = (u , v) A−1 ( A−1 ) (u , v) 1 −ρ = (u , v) (u , v) /(1 − ρ 2 ) −ρ 0 = u 2 − 2ρ u v + v 2 1 − ρ2 Thus U and V have joint density 1 •standard bivariate normal <12.1> exp − u 2 − 2ρ u v + v 2 2(1 − ρ 2 ) for all u , v. 2π 1 − ρ 2 The joint distribution is sometimes called the standard bivariate normal distribution with correlation ρ . The symmetry of ψ in u and v implies that V has the same marginal distribution as U , that is, V is also N (0, 1) distributed. The calculation of the marginals densities involves the same integration for both variables. When ρ equals zero, the joint density factorizes into 1 1 √ exp(−u 2 /2) √ exp(−v 2 /2) 2π 2π which implies independence of U and V . That is, for random variables with a bivariate normal distribution, zero correlation is equivalent to independence. The equivalence for bivariate normals probably accounts for the widespread confusion between the properties of independence and zero correlation. In general, independence implies zero correlation, but not conversely. Definition. Random variables S and T are said to have a bivariate normal distribution, 2 2 with parameters E S = µ S , ET = µT , var( S ) = σ S , var(T ) = σT , and correlation ρ if the standardized random variables ( S − µ S )/σ S and (T − µT )/σT have a standard bivariate normal distribution with correlation ρ . Problem 11.1 shows how to calculate explicitly the joint density for S and T . Conditional distributions The construction of U and V from the independent X and Y makes the calculation of the conditional distribution of V given U = u a triviality: ρX + has the distribution of ρ x + <12.2> 1 − ρ2 Y | X = x 1 − ρ 2 N (0, 1). That is, V | U = u ∼ N (ρ u , 1 − ρ 2 ) The symmetry of the joint distribution of U and V implies that U | V = v ∼ N (ρv, 1 − ρ 2 ), a fact that you could check by explicit calculation of the ratio of joint to marginal densities: ψ(u , v) <12.3> ∞ −∞ ψ(u , v) du = √ 1 2π 1 − ρ 2 exp − (v − ρ u )2 2(1 − ρ 2 ) Example. Let X denote the height (in inches) of a randomly chosen father, and let Y denote the height (in inches) of his son at maturity. Suppose each of X and Y has a N (µ, σ 2 ) distribution with µ = 69 and σ = 2. Suppose also that X and Y have a bivariate normal distribution with correlation ρ = .3. If Sam has a height of 74 inches, what would one predict about the ultimate height of his son Elmer? Statistics 241: 16 November 1997 c David Pollard Page 2 Chapter 12 Multivariate normal distributions In standardized units, U = ( X − µ)/σ = Sam’s standardized height, which happens to equal 2.5 V = (Y − µ)/σ = Elmer’s standardized ultimate height. By assumption, before the value of U was known, the pair (U , V ) has a standard bivariate normal distribution with correlation ρ . From the analog of formula <12.2>, V | U = 2.5 ∼ N (2.5ρ, 1 − ρ 2 ) In the original units, Elmer’s height | Sam’s height = 74 inches ∼ N (µ + 2.5ρσ, (1 − ρ 2 )σ 2 ) = N (70.5, 3.64) Notice that Elmer’s expected height (given that Sam is 74 inches) is less than his father’s height. This fact is an example of a general phenomenon called “regression towards the mean”. The term regression, as a synonym for conditional expectation, has become commonplace in Statistics. •regression Multivariate densities Random variables X 1 , X 2 , . . . are said to have a jointly continuous distribution with joint density function f (x1 , x2 , . . . , xn ) if P{( X 1 , X 2 , . . . , X n ) ∈ A} = •random ... {(x1 , x2 , . . . xn ) ∈ A} f (x1 , x2 , . . . , xn ) d x1 d x2 . . . d xn for each subset A of Rn . The density f must be nonnegative and integrate to 1 over Rn . It is convenient to write X for the random vector ( X 1 , . . . , X n ), and x for the generic point (x1 , . . . , xn ) in Rn . Then the defining property for the joint density becomes vector P{X ∈ A} = where <12.4> {x ∈ A} f (x) d x for A ⊆ Rn . . . d x should be understood as an n -fold integral. Example. If the random variables X 1 , . . . , X n are independent, the joint density function is equal to the product of the marginal densities for each X i , and conversely. The proof is similar to the proof for the bivariate case. For example, if the { X i } are independent and each X i has a N (0, 1) distribution, the joint density is f (x1 , . . . , xn ) = 1 exp − xi2 /2 (2π)n /2 i ≤n for all x1 , . . . , xn 1 exp(− x 2 /2) for all x (2π)n /2 The distribution is denoted by N (0, In ). It is sometimes called the “spherical normal distribution”, because of the spherical symmetry of the density. = The methods for finding joint densities for random variables defined as functions of other random variables with jointly continuous distributions—as explained over the last two Chapters—extend to multivariate distributions. There is a problem with the drawing of n dimensional pictures, to keep track of the transformations, and one must remember to say “n -dimensional volume” instead of area, but otherwise calculations are not much more complicated than in two dimensions. Rotation of coordinate axes The spherical symmetry of the density f (·) is responsible for an important property of multivariate normals. Let q1 , . . . , qn be a new orthonormal basis for Rn , and let Z = W1 q1 + . . . + Wn qn Statistics 241: 16 November 1997 c David Pollard Page 3 Chapter 12 Multivariate normal distributions be the representation for Z in the new basis. <12.5> Theorem. The W1 , . . . , Wn are also independent N (0, 1) distributed random variables. In two dimensions, the assertion follows from the transformation formulae of Chapter 10. If the axes are rorated through an angle θ , then W1 = Z 1 cos(θ ) + Z 2 sin(θ ) W2 = − Z 1 sin(θ ) + Z 2 cos(θ ) That is, ( W1 , W2 ) = ( Z 1 , Z 2 ) A θ cos(θ ) − sin(θ ) sin(θ ) cos(θ ) where Aθ = The matrix Aθ has determinant 1 and inverse A−θ . It is an orthogonal matrix; it preserves lengths. The joint density of (W1 , W2 ) is 1 1 2 2 exp − (w1 , w2 ) A−1 2 /2 = exp −(w1 + w2 )/2 2π 2π ball B (in Z-coordinates) = ball B* (in W-coordinates) z2 w1 w2 z1 A more intuitive explanation is based on the approximation P{Z ∈ B } ≈ f (z)(volume of B ) for a small ball B centered at z. The transformation from Z to W corresponds to a rotation, so P{Z ∈ B } = P{W ∈ B ∗ }, where B ∗ is a ball of the same radius, but centered at the point w = (w1 , . . . , wn ) for which w1 q1 + . . . + wn qn = z. The last equality implies w = z , from which we get P{W ∈ B ∗ } ≈ (2π )−n /2 exp(− 1 w 2 )(volume of B ∗ ). 2 That is, W has the asserted spherical normal density. <12.6> •chi-square Definition. Let Z = ( Z 1 , Z 2 , . . . , Z n ) have a spherical normal distribution N (0, In ). The 2 2 2 chi-square, χn , is defined as the distribution of Z 2 = Z 1 + . . . + Z n . To prove results about the spherical normal it is often merely a matter of transforming to an appropriate orthonormal basis. <12.7> Exercise. Suppose Z 1 , Z 2 , . . . , Z n are independent, each distributed N (0, 1). Define Z1 + . . . + Zn ¯ ¯ and T= ( Z i − Z )2 Z= n i ≤n 2 ¯ Show that Z has a N (0, 1/ n ) distribution independently of T , which has a χn −1 distribution. Statistics 241: 16 November 1997 c David Pollard Page 4 Chapter 12 Multivariate normal distributions √ Solution: Choose the new orthonormal basis with q1 = (1, 1, . . . , 1) / n . Choose q2 , . . . , qn however you like, provided they are orthogonal unit vectors, all orthogonal to q1 . In the new coordinate system, Z = W1 q1 + . . . + Wn qn We could calculate each Wi by dotting the sum on the right- hand side with qi : only Wi would survive. In particular, √ Z1 + . . . + Zn ¯ = nZ. W 1 = Z · q1 = √ n ¯ From Theorem <12.5> we know that W1 has a N (0, 1) distribution. It follows that Z has a N (0, 1/ n ) distribution. The random variable T equals the squared length of the vector √ ¯ ¯ ¯ ( Z 1 − Z , . . . , Z n − Z ) = Z − Z ( n q1 ) = Z − W1 q1 = W2 q2 + . . . + Wn qn That is, T = W2 q2 + . . . + Wn qn 2 2 2 = W2 + . . . + Wn , 2 a sum of squares of n − 1 independent N (), 1) random variables, which has a χn −1 distribution. ¯ Finally, notice that Z is a function of W1 , whereas T is a function of the independent ¯ random variables W2 , . . . , Wn . The independence of Z and T follows. Exercise for the reader: Suppose X 1 , . . . X n are independent, each distributed N (µ, σ 2 ). ¯ Apply the results from the last Exercise, with Z i = ( X i − µ)/σ , to deduce that X is distributed N (µ, σ 2 / n ) independently of ¯ ( X i − X )2 /σ 2 , i ≤n which has a 2 χn −1 distribution. Statistics 241: 16 November 1997 c David Pollard Page 5 ...
View Full Document

This note was uploaded on 11/17/2011 for the course STOR 664 taught by Professor Staff during the Fall '11 term at UNC.

Ask a homework question - tutors are online