Econ226_IIAC

# Econ226_IIAC - II Vector autoregressions A Introduction 1 VARs as forecasting models Forecasting Let y 1 t be the first elem ent of a vector y t y

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: II. Vector autoregressions A. Introduction 1. VARs as forecasting models Forecasting: Let y 1 t be the first elem ent of a vector y t y 1 t |t 1 ft y t 1 , y t 2, . . . , y 1 Convenient assumptions: 1) time invariant 2) linear 3) finite parameter y 1t|t 1 1 yt 1 c 2yt 2 p yt p Useful result: Define the “true values” of parameters c, 1 , 2 , . . . p to be the values that minimize plim T 1 T t1 y 1t y 1t|t 1 2 1 Then under fairly general conditions, the true values are estimated consistently by OLS regression of y 1t on a constant and y t 1 , yt 2 , . . . , y t p Estimate separate OLS regression for each element of y t VAR One-step-ahead forecast: y t|t 1 c 1yt 1 2yt 2 pyt p II. Vector autoregressions A. Introduction 1. VARs as forecasting models 2. Gaussian VARs as data-generating process 2 Consider the following process whereby the vector y t might have been generated: yt 1yt 1 2yt 2 p yt p t c t i.i.d. N 0, log likelihood: log f y 1 , y 2 , . . . , y T |y 0 , y 1 , . . . , y Tn/2 log 2 1/2 t yt c T t1 1 yt 1 p 1, T/2 log | | t 1 t 2 yt 2 pyt p vector containing elements of c, 1 , 2 , . . . , p , Proposition: (1) The values of c, 1 , 2 , . . . , p that maximize the likelihood function are identical to results from OLS estimation, equation by equation 3 (2) The MLE T t yt is given by T t1 1 c t t 1 yt 1 2 yt 2 p yt p Gaussian VAR is nice example of where quasi-maximum likelihood estimation can be justified II. Vector autoregressions A. Introduction 1. VARs as forecasting models 2. Gaussian VARs as data-generating process 3. VARs as ad hoc dynamic structural models 4 Example: f t f ederal funds rate y t output growth inflation t m t money growth rate Represent Fed behavior by ft 0 1 yt 2t 3ft 1 vt 4yt 1 5 t1 6 mt 1 Fed responds to current output and inflation but not current money growth yt f t, yt , 1 B 0 yt t, mt 1 2 k B1 yt 1 vt 0 B0 3 4 5 6 B1 5 If vt i.i.d. N 0, D , log likelihood is log f y 1 , y 2 , . . . , y T |y 0 , y 1 , . . . , y p 1 , T/2 log |D| Tn/2 log 2 T log |B 0 | D vt E v t vt B0 yt k B1 y t B p yt p 1/2 1 T t1 B2y t vt D 1 v t 2 vector containing elements of k, B 0 , B 1 , . . . , B p , D If model is just-identified, the MLE’s k, B 0 , B 1 , . . . , B p , D are transformations of the VAR MLE’s c, 1, 2, . . . , p, II. Vector autoregressions A. Introduction B. Normal-Wishart priors for VARs 6 yt n1 nk np k xt k1 1 t n1 1, y t 1 , y t 2 , . . . , y t xt t |yt 1 , y t 2 , . . . , y 1 p ~ N 0, Likelihood function: y1 Y Tn yT p Y| , exp Tn/2 | 2 T t1 1 2 yt | T/2 xt 1 yt xt Prior for : univariate regression 2 E2 t Z i ~ N 0, W Z2 1 1 Z2 2 Z2 N W has gamma distribution with parameters N, 2 ~ N, 7 Vector autoregression: E tt ~ N 0, zi 1 n1 W z 1 z1 z2z2 zNzN W has Wishart distribution with parameters N, 1 ~ W N, W ~ W N, c| |N/2 |w| N pw c 2 Nn/2 n n 1 /4 1 2 n 1 /2 exp n j1 tr w N1j 2 1 so prior takes the form p 1 || yt n1 N n 1 /2 nk exp xt k1 1 2 tr 1 t n1 vec nk 1 f irst k components of coefficients to explain y 1t 8 Prior for : univariate regression 2 | 2 ~ N m, M) vector autoregression 1 | ~N m, p| 1 2 exp 1 2 1 |Y || nk/2 1 1 1 m ,Y 1 p| 1 p T/2 exp 1 2 yt exp N n 1 /2 exp 1 xt 1 2 k/2 || M p, p Y| , || kk k/2 m p, || M nn nk 1 M m 1 2 tr yt xt 1 m 1 After a lot of algebra, this can be rewritten as p , 1 |Y || || 1 2 k/2 exp T N n 1 /2 1 i.e., | 1 exp m 1 2 ,Y ~ N m , |Y ~ W T M tr 1 m 1 M N, 9 1 | ,Y ~ N m , M M m 1 In M 1 T xx t1 t t 1 MM In m T t1 M xtxt vec T t1 1 x txt Diffuse prior: M 1 | ,Y ~ N T t1 1 xt yt 0 T t1 , 1 x txt Estimate ith equation of VAR by OLS T t1 i 1 xt x t T t1 x t y it (corresponds to elements k i 1 1 through ki of ) i is posterior mean of T t1 ii xtx t 1 i is posterior variance (conditional on 1 |Y ~ W T S T t1 S T t1 vec V ) N, Q yt V and xt yt xt m Q x t x t M M 1V kn Diffuse prior: N 0, M 1 0, 0 10 Diffuse prior: 1 |Y ~ W T, S 1 | ,Y ~ N T t1 , x t xt 1 To generate a draw from the posterior distribution of , 1 with a diffuse prior: (1) Estimate ith equation of VAR by OLS for i 1, 2, . . . , n T t1 i xt x t T t1 M xt x t 1 T t1 x t y it 1 (2) Calculate residual of ith equation and sum of outer products of residuals: it S nn ixt yit T t1 t t 11 (3) Generate an artificial sample of size T of n 1 vector z t where zt ~ N 0, S and calculate the n T t1 w n matrix ztzt W 1. (4) Set (5) Generate a draw for N, 1 M f rom a distribution. The values of f rom step (4) and f rom step (5) represent a single draw from the posterior distribution. To generate a Monte Carlo sample of D draws, repeat steps (3)-(5) D times. To get standard errors on impulseresponse functions, for each draw calculate the impulse-response function implied by that value of . 12 Usual procedure: T 1 S for all draws is fixed at asymptotic classical distribution or approximation to Bayesian distribution with diffuse prior. Example of using a non-diffuse prior (Del Negro and Schorfheide, 2002). They use a log-linearization of a real business cycle model to get theoretical values for VAR parameters: 0xt yt E tt t 0 Suppose we had an artificial “sample” of observations y t N, and T t p1 and base m, M, on a diffuse inference from this sample: 13 T t1 M T t1 M m xt x t 1 x t yt vec N T T t1 yt xt yt xt W e could then use these values of m, M, N, and together with the actual observed data y t T t p1 to form a posterior inference using the earlier formulas, where choice of T would reflect how much weight to put on the “real business cycle prior.” Actually, we can use the RBC’s implied values of 0 and 0 not just to simulate a few observations, but we can use them to calculate analytically the moments: E x t xt A0 E x t yt B0 E y t yt C0 where A 0 , B0 and C 0 are functions of and 0 0. 14 T t1 M T t1 M m N xt x t 1 TA 0 1 A 01 B0 x t yt vec T T t1 yt T C0 xt yt xt B0 A 0 1 B 0 Del Negro and Schorfheide find that putting equal weights on prior and data T T results in substantially better forecasts than unrestricted VAR, and better than Minnesota prior for horizons greater than 1 quarter. RBC good simplification (shrinkage). Problem with Normal-Wishart prior | 1 ~N m, nk 1 y 1,t y 2,t 1 1 nn M kk f irst element of x t second element of x t 15 | 1 ~N m, nk 1 confidence in y 1t to y1,t 1 is confidence in y 1t to y2,t 1 is | 1 ~N m, nk 1 confidence in y 2t to y1,t 1 is confidence in y 2t to y2,t 1 is M nn 1 kk coefficient relating 11 m 11 2 coefficient relating 11 m 22 nn k1 M kk coefficient relating 22 m 11 k2 coefficient relating 22 m 22 Problem: If 11 m 11 11 m 22 (pretty confident variable 2 doesn’t matter for variable 1), then must have (think variable 2 doesn’t matter for variable 2 either) 22 m 11 22 m 22 16 Ways to get around this problem: (1) Assume that is diagonal. Then single-equation methods are equivalent to full-system inference, use different M i for each equation. (2) Drop natural conjugates, turn to numerical Bayesian methods. II. Vector autoregressions A. Introduction B. Normal-Wishart priors for VARs C. Bayesian analysis of structural VARs 17 Consider structural VAR: A yt B xt nn xt k nk n1 yt 1 , yt 2, . . . , yt p , 1 np 1 v t |yt 1 , y t 2 , . . . , y p y t |yt 1 , y t 2 , . . . , y yt A p1 ~ N 0, I n p1 p v t |yt 1 , y t 2 , . . . , y vt yt p1 1 vt 1 p y 1 , . . . , yT |A, B; y 0 , y 1 , . . . , y 2 exp vt n1 k1 1 2 T t1 p1 Tn/2 |A|T A yt B xt A yt B xt Take transpose yt A xtB vt and stack rows on top of each other for t 1, 2, . . . , T: Y Tn A nn X B V Tk kn Tn 18 B b1 kn bi k bn 1 vector of coefficients on the lags in the ith structural equation: yta i x tbi vit The vec operator stacks the columns of a k n matrix on top of each other, from left to right, to form a kn 1 vector: b1 vec B b bn So first k elements of the nk 1 vector b correspond to the coefficients on lags for the first equation in the VAR. Y A Tn X vec XB B V Tk nn kn Tn In nn Tn 1 X vec B Tk X for X b Tn nk kn nk 1 In X 19 Likewise define y vec YA Tn 1 Ya In for a vec A . First n elements of a n2 1 correspond to coefficients on contemporaneous variables in first equation of VAR. Y A Tn X B V Tk nn kn Tn Taking vec of full system, Xb y v where v ~ N 0, ITn . Note that conditional on a, this is a classical regression model with unit variance for the residual. 2 p Y|a, b exp 1 2 Tn/2 |A| T y Xb y Xb b|a ~ N m a , M a p b|a exp nk/2 2 1 2 b ma |M a | 1/2 Ma 1 b ma p a arbitrary 20 p b, a|Y p b|a, Y p a|Y b|a, Y ~ N m a , M a X 1 In In 1 XX 1 Ma Ma X Ma Ma XX 1 b|a, Y ~ N m a , M a ma Xy Ma In In X Ma In 1 ma Ya XYa Still, we need to “invert” nk Ma Xy 1 In XX nk matrix 1 21 Suppose our prior for equation i takes the form b i |a ~ N mi a , M i a f or M i a a k k matrix, with priors independent across equations. Ma 1 In M1 a 1 XX 1 0 1 Mn a 0 XX 0 1 0 XX Ma 1 In 1 XX M1 a 0 f or M i a 0 Mn a Mi a 1 XX 1 22 ma Ma 1 Ma ma XYa In m1 a M1 a q1 a mn a Mn a qn a qi a Mi a 1 mi a X Ya i Specification of p b|a : expect each series to behave like a random walk. Structural equations: Y Tn A X B V Tk nn kn Tn Reduced form: Y X Tn Tk 1 E E kn Tn BA VA 1 23 Random walk: In 0 B kn kn A 1 nn 0 A E B|A 0 0 b i |a ~ N mi a , M i a ai mi a 0 0 Let m i r denote the rth element of this vector (prior expectation of rth element of b i ) and M i r its variance: b i r |a ~ N mi r , M i r with priors across coefficients taken to be independent. 24 Multplying the likelihood by the prior density p b i r |a 1/2 2 M ir bi r m i r 1/2 exp 2M i 2 r is numerically identical to acting as if I had observed a variable q i r f rom the system qir bi r / M i r vi r where the observed value of q i r is m i r / M i r and v i r ~ N 0, 1 . Or, since m i r is the rth element of a i if r n and is zero otherwise, this is numerically identical to having observed the n 1 vector y i r and 1 vector x i r f rom the k following system: yi r a i yi r xir x i r bi vi r er n / M ir if r 0 otherwise n er k / M ir f or e r n the rth column of In e r k the rth column of Ik 25 Stack these “dummy observations” for r 1, 2, . . . , k in matrices yi1 x i1 Yid X id kn yi kk k x ik The posterior p b i |a, Y is then: b i |a, Y ~ N m i a , M i a Mi a Mi a 1 X id Xid mi a Mi a X id Xid XX XX Mi a XX 1 1 1 mi a 1 X id Yid X Ya i X Y ai b i r |a ~ N mi r , M i r Remaing question: how choose M i r . Let j r denote which variable the rth coefficient refers to j 1, 2, . . . , n and r its lag 1, 2, . . . , p i.e., b i r is the coefficient relating y t a i to y j,t 26 (1) If units in which y jt is measured are doubled, value of b i r is cut in half make M i r inversely proportional to j, the standard deviation of univariate autoregression for y jt (2) have more confidence in zero priors for bigger make M i r inversely proportional to 3 0 3 b i r |a ~ N m i r , M i r 01 Mi r jr 04 r 3 for r 1, 2, . . . , k for r k 1 where 0 controls tightness of prior for a and 1 controls tightness of random walk prior 27 Distribution of a p a, b|Y p b|a, Y p a|Y b|a, Y ~ N m a , M a p a|Y |I Tn exp p a |A|T In X M a I n 1 2 a In ma Ma ma X| 1/2 YY a 1 ma Ma 1 ma W hat is this distribution p a|Y ? Use numerical methods. Example: Sims-Zha, distribution of p a|Y : p a |A|T p a|Y |I Tn In exp 1 2 X M a In a In ma Ma ma X| 1/2 YY a 1 ma Ma 1 ma qa 28 Importance density g a should be similar to q a but with fatter tails. (1) Find a 0 arg max q a a H0 2 log q a | aa a a0 (2) Let g a be n 2 -dimensional Student tdistribution centered at a 0 with scale matrix H 0 1 and 9 degreees of freedom. (3) Generate jth draw a j from g a . E.g., generate u j ~ N 0, In 2 vj P 01 u i a0 P 0 is Cholesky factor of H 0 e j ~ N 0, I9 j 1/9 e j e j aj vj/ j 29 (4) Calculate ga j a c1 a 0 H0 1 a i i choosing c so that max g a j 1,,,.D scale as max q a n 2 9 /2 a0 j is same j j 1,,,. D (5) Calculate the weight j j qa ga j j (6) Repeat steps (3)-(5) for 1, 2, . . . , D. (7) The sample a 1 , . . . , a D , when weighted by K 1 1 , . . . , K 1 D f or K 1 D , is a sample from p a|Y , e.g., E a|Y D j1 D j1 j a j j 30 (8) For each j, generate b the N m a j weight with K j ,M a 1 j j f rom distribution and f or a draw from p b|Y , e.g. E b|Y D j1 D j1 j bj j 31 ...
View Full Document

## This note was uploaded on 03/02/2012 for the course ECON 226 taught by Professor Jameshamilton during the Winter '09 term at UCSD.

Ask a homework question - tutors are online