Econ226_IIAC

# Econ226_IIAC - II Vector autoregressions A Introduction 1 VARs as forecasting models Forecasting Let y 1 t be the first elem ent of a vector y t y

Unformatted text preview: II. Vector autoregressions A. Introduction 1. VARs as forecasting models Forecasting: Let y 1 t be the first elem ent of a vector y t y 1 t |t 1 ft y t 1 , y t 2, . . . , y 1 Convenient assumptions: 1) time invariant 2) linear 3) finite parameter y 1t|t 1 1 yt 1 c 2yt 2 p yt p Useful result: Define the “true values” of parameters c, 1 , 2 , . . . p to be the values that minimize plim T 1 T t1 y 1t y 1t|t 1 2 1 Then under fairly general conditions, the true values are estimated consistently by OLS regression of y 1t on a constant and y t 1 , yt 2 , . . . , y t p Estimate separate OLS regression for each element of y t VAR One-step-ahead forecast: y t|t 1 c 1yt 1 2yt 2 pyt p II. Vector autoregressions A. Introduction 1. VARs as forecasting models 2. Gaussian VARs as data-generating process 2 Consider the following process whereby the vector y t might have been generated: yt 1yt 1 2yt 2 p yt p t c t i.i.d. N 0, log likelihood: log f y 1 , y 2 , . . . , y T |y 0 , y 1 , . . . , y Tn/2 log 2 1/2 t yt c T t1 1 yt 1 p 1, T/2 log | | t 1 t 2 yt 2 pyt p vector containing elements of c, 1 , 2 , . . . , p , Proposition: (1) The values of c, 1 , 2 , . . . , p that maximize the likelihood function are identical to results from OLS estimation, equation by equation 3 (2) The MLE T t yt is given by T t1 1 c t t 1 yt 1 2 yt 2 p yt p Gaussian VAR is nice example of where quasi-maximum likelihood estimation can be justified II. Vector autoregressions A. Introduction 1. VARs as forecasting models 2. Gaussian VARs as data-generating process 3. VARs as ad hoc dynamic structural models 4 Example: f t f ederal funds rate y t output growth inflation t m t money growth rate Represent Fed behavior by ft 0 1 yt 2t 3ft 1 vt 4yt 1 5 t1 6 mt 1 Fed responds to current output and inflation but not current money growth yt f t, yt , 1 B 0 yt t, mt 1 2 k B1 yt 1 vt 0 B0 3 4 5 6 B1 5 If vt i.i.d. N 0, D , log likelihood is log f y 1 , y 2 , . . . , y T |y 0 , y 1 , . . . , y p 1 , T/2 log |D| Tn/2 log 2 T log |B 0 | D vt E v t vt B0 yt k B1 y t B p yt p 1/2 1 T t1 B2y t vt D 1 v t 2 vector containing elements of k, B 0 , B 1 , . . . , B p , D If model is just-identified, the MLE’s k, B 0 , B 1 , . . . , B p , D are transformations of the VAR MLE’s c, 1, 2, . . . , p, II. Vector autoregressions A. Introduction B. Normal-Wishart priors for VARs 6 yt n1 nk np k xt k1 1 t n1 1, y t 1 , y t 2 , . . . , y t xt t |yt 1 , y t 2 , . . . , y 1 p ~ N 0, Likelihood function: y1 Y Tn yT p Y| , exp Tn/2 | 2 T t1 1 2 yt | T/2 xt 1 yt xt Prior for : univariate regression 2 E2 t Z i ~ N 0, W Z2 1 1 Z2 2 Z2 N W has gamma distribution with parameters N, 2 ~ N, 7 Vector autoregression: E tt ~ N 0, zi 1 n1 W z 1 z1 z2z2 zNzN W has Wishart distribution with parameters N, 1 ~ W N, W ~ W N, c| |N/2 |w| N pw c 2 Nn/2 n n 1 /4 1 2 n 1 /2 exp n j1 tr w N1j 2 1 so prior takes the form p 1 || yt n1 N n 1 /2 nk exp xt k1 1 2 tr 1 t n1 vec nk 1 f irst k components of coefficients to explain y 1t 8 Prior for : univariate regression 2 | 2 ~ N m, M) vector autoregression 1 | ~N m, p| 1 2 exp 1 2 1 |Y || nk/2 1 1 1 m ,Y 1 p| 1 p T/2 exp 1 2 yt exp N n 1 /2 exp 1 xt 1 2 k/2 || M p, p Y| , || kk k/2 m p, || M nn nk 1 M m 1 2 tr yt xt 1 m 1 After a lot of algebra, this can be rewritten as p , 1 |Y || || 1 2 k/2 exp T N n 1 /2 1 i.e., | 1 exp m 1 2 ,Y ~ N m , |Y ~ W T M tr 1 m 1 M N, 9 1 | ,Y ~ N m , M M m 1 In M 1 T xx t1 t t 1 MM In m T t1 M xtxt vec T t1 1 x txt Diffuse prior: M 1 | ,Y ~ N T t1 1 xt yt 0 T t1 , 1 x txt Estimate ith equation of VAR by OLS T t1 i 1 xt x t T t1 x t y it (corresponds to elements k i 1 1 through ki of ) i is posterior mean of T t1 ii xtx t 1 i is posterior variance (conditional on 1 |Y ~ W T S T t1 S T t1 vec V ) N, Q yt V and xt yt xt m Q x t x t M M 1V kn Diffuse prior: N 0, M 1 0, 0 10 Diffuse prior: 1 |Y ~ W T, S 1 | ,Y ~ N T t1 , x t xt 1 To generate a draw from the posterior distribution of , 1 with a diffuse prior: (1) Estimate ith equation of VAR by OLS for i 1, 2, . . . , n T t1 i xt x t T t1 M xt x t 1 T t1 x t y it 1 (2) Calculate residual of ith equation and sum of outer products of residuals: it S nn ixt yit T t1 t t 11 (3) Generate an artificial sample of size T of n 1 vector z t where zt ~ N 0, S and calculate the n T t1 w n matrix ztzt W 1. (4) Set (5) Generate a draw for N, 1 M f rom a distribution. The values of f rom step (4) and f rom step (5) represent a single draw from the posterior distribution. To generate a Monte Carlo sample of D draws, repeat steps (3)-(5) D times. To get standard errors on impulseresponse functions, for each draw calculate the impulse-response function implied by that value of . 12 Usual procedure: T 1 S for all draws is fixed at asymptotic classical distribution or approximation to Bayesian distribution with diffuse prior. Example of using a non-diffuse prior (Del Negro and Schorfheide, 2002). They use a log-linearization of a real business cycle model to get theoretical values for VAR parameters: 0xt yt E tt t 0 Suppose we had an artificial “sample” of observations y t N, and T t p1 and base m, M, on a diffuse inference from this sample: 13 T t1 M T t1 M m xt x t 1 x t yt vec N T T t1 yt xt yt xt W e could then use these values of m, M, N, and together with the actual observed data y t T t p1 to form a posterior inference using the earlier formulas, where choice of T would reflect how much weight to put on the “real business cycle prior.” Actually, we can use the RBC’s implied values of 0 and 0 not just to simulate a few observations, but we can use them to calculate analytically the moments: E x t xt A0 E x t yt B0 E y t yt C0 where A 0 , B0 and C 0 are functions of and 0 0. 14 T t1 M T t1 M m N xt x t 1 TA 0 1 A 01 B0 x t yt vec T T t1 yt T C0 xt yt xt B0 A 0 1 B 0 Del Negro and Schorfheide find that putting equal weights on prior and data T T results in substantially better forecasts than unrestricted VAR, and better than Minnesota prior for horizons greater than 1 quarter. RBC good simplification (shrinkage). Problem with Normal-Wishart prior | 1 ~N m, nk 1 y 1,t y 2,t 1 1 nn M kk f irst element of x t second element of x t 15 | 1 ~N m, nk 1 confidence in y 1t to y1,t 1 is confidence in y 1t to y2,t 1 is | 1 ~N m, nk 1 confidence in y 2t to y1,t 1 is confidence in y 2t to y2,t 1 is M nn 1 kk coefficient relating 11 m 11 2 coefficient relating 11 m 22 nn k1 M kk coefficient relating 22 m 11 k2 coefficient relating 22 m 22 Problem: If 11 m 11 11 m 22 (pretty confident variable 2 doesn’t matter for variable 1), then must have (think variable 2 doesn’t matter for variable 2 either) 22 m 11 22 m 22 16 Ways to get around this problem: (1) Assume that is diagonal. Then single-equation methods are equivalent to full-system inference, use different M i for each equation. (2) Drop natural conjugates, turn to numerical Bayesian methods. II. Vector autoregressions A. Introduction B. Normal-Wishart priors for VARs C. Bayesian analysis of structural VARs 17 Consider structural VAR: A yt B xt nn xt k nk n1 yt 1 , yt 2, . . . , yt p , 1 np 1 v t |yt 1 , y t 2 , . . . , y p y t |yt 1 , y t 2 , . . . , y yt A p1 ~ N 0, I n p1 p v t |yt 1 , y t 2 , . . . , y vt yt p1 1 vt 1 p y 1 , . . . , yT |A, B; y 0 , y 1 , . . . , y 2 exp vt n1 k1 1 2 T t1 p1 Tn/2 |A|T A yt B xt A yt B xt Take transpose yt A xtB vt and stack rows on top of each other for t 1, 2, . . . , T: Y Tn A nn X B V Tk kn Tn 18 B b1 kn bi k bn 1 vector of coefficients on the lags in the ith structural equation: yta i x tbi vit The vec operator stacks the columns of a k n matrix on top of each other, from left to right, to form a kn 1 vector: b1 vec B b bn So first k elements of the nk 1 vector b correspond to the coefficients on lags for the first equation in the VAR. Y A Tn X vec XB B V Tk nn kn Tn In nn Tn 1 X vec B Tk X for X b Tn nk kn nk 1 In X 19 Likewise define y vec YA Tn 1 Ya In for a vec A . First n elements of a n2 1 correspond to coefficients on contemporaneous variables in first equation of VAR. Y A Tn X B V Tk nn kn Tn Taking vec of full system, Xb y v where v ~ N 0, ITn . Note that conditional on a, this is a classical regression model with unit variance for the residual. 2 p Y|a, b exp 1 2 Tn/2 |A| T y Xb y Xb b|a ~ N m a , M a p b|a exp nk/2 2 1 2 b ma |M a | 1/2 Ma 1 b ma p a arbitrary 20 p b, a|Y p b|a, Y p a|Y b|a, Y ~ N m a , M a X 1 In In 1 XX 1 Ma Ma X Ma Ma XX 1 b|a, Y ~ N m a , M a ma Xy Ma In In X Ma In 1 ma Ya XYa Still, we need to “invert” nk Ma Xy 1 In XX nk matrix 1 21 Suppose our prior for equation i takes the form b i |a ~ N mi a , M i a f or M i a a k k matrix, with priors independent across equations. Ma 1 In M1 a 1 XX 1 0 1 Mn a 0 XX 0 1 0 XX Ma 1 In 1 XX M1 a 0 f or M i a 0 Mn a Mi a 1 XX 1 22 ma Ma 1 Ma ma XYa In m1 a M1 a q1 a mn a Mn a qn a qi a Mi a 1 mi a X Ya i Specification of p b|a : expect each series to behave like a random walk. Structural equations: Y Tn A X B V Tk nn kn Tn Reduced form: Y X Tn Tk 1 E E kn Tn BA VA 1 23 Random walk: In 0 B kn kn A 1 nn 0 A E B|A 0 0 b i |a ~ N mi a , M i a ai mi a 0 0 Let m i r denote the rth element of this vector (prior expectation of rth element of b i ) and M i r its variance: b i r |a ~ N mi r , M i r with priors across coefficients taken to be independent. 24 Multplying the likelihood by the prior density p b i r |a 1/2 2 M ir bi r m i r 1/2 exp 2M i 2 r is numerically identical to acting as if I had observed a variable q i r f rom the system qir bi r / M i r vi r where the observed value of q i r is m i r / M i r and v i r ~ N 0, 1 . Or, since m i r is the rth element of a i if r n and is zero otherwise, this is numerically identical to having observed the n 1 vector y i r and 1 vector x i r f rom the k following system: yi r a i yi r xir x i r bi vi r er n / M ir if r 0 otherwise n er k / M ir f or e r n the rth column of In e r k the rth column of Ik 25 Stack these “dummy observations” for r 1, 2, . . . , k in matrices yi1 x i1 Yid X id kn yi kk k x ik The posterior p b i |a, Y is then: b i |a, Y ~ N m i a , M i a Mi a Mi a 1 X id Xid mi a Mi a X id Xid XX XX Mi a XX 1 1 1 mi a 1 X id Yid X Ya i X Y ai b i r |a ~ N mi r , M i r Remaing question: how choose M i r . Let j r denote which variable the rth coefficient refers to j 1, 2, . . . , n and r its lag 1, 2, . . . , p i.e., b i r is the coefficient relating y t a i to y j,t 26 (1) If units in which y jt is measured are doubled, value of b i r is cut in half make M i r inversely proportional to j, the standard deviation of univariate autoregression for y jt (2) have more confidence in zero priors for bigger make M i r inversely proportional to 3 0 3 b i r |a ~ N m i r , M i r 01 Mi r jr 04 r 3 for r 1, 2, . . . , k for r k 1 where 0 controls tightness of prior for a and 1 controls tightness of random walk prior 27 Distribution of a p a, b|Y p b|a, Y p a|Y b|a, Y ~ N m a , M a p a|Y |I Tn exp p a |A|T In X M a I n 1 2 a In ma Ma ma X| 1/2 YY a 1 ma Ma 1 ma W hat is this distribution p a|Y ? Use numerical methods. Example: Sims-Zha, distribution of p a|Y : p a |A|T p a|Y |I Tn In exp 1 2 X M a In a In ma Ma ma X| 1/2 YY a 1 ma Ma 1 ma qa 28 Importance density g a should be similar to q a but with fatter tails. (1) Find a 0 arg max q a a H0 2 log q a | aa a a0 (2) Let g a be n 2 -dimensional Student tdistribution centered at a 0 with scale matrix H 0 1 and 9 degreees of freedom. (3) Generate jth draw a j from g a . E.g., generate u j ~ N 0, In 2 vj P 01 u i a0 P 0 is Cholesky factor of H 0 e j ~ N 0, I9 j 1/9 e j e j aj vj/ j 29 (4) Calculate ga j a c1 a 0 H0 1 a i i choosing c so that max g a j 1,,,.D scale as max q a n 2 9 /2 a0 j is same j j 1,,,. D (5) Calculate the weight j j qa ga j j (6) Repeat steps (3)-(5) for 1, 2, . . . , D. (7) The sample a 1 , . . . , a D , when weighted by K 1 1 , . . . , K 1 D f or K 1 D , is a sample from p a|Y , e.g., E a|Y D j1 D j1 j a j j 30 (8) For each j, generate b the N m a j weight with K j ,M a 1 j j f rom distribution and f or a draw from p b|Y , e.g. E b|Y D j1 D j1 j bj j 31 ...
## This note was uploaded on 03/02/2012 for the course ECON 226 taught by Professor Jameshamilton during the Winter '09 term at UCSD.

