Econ226_ICE

Econ226_ICE - I Bayesian econometrics A Introduction B...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory 1. Example: portfolio allocation problem aj quantity of asset j purchased j 1, . . . , J y income budget constraint: J aj y j1 rj gross rate of return on asset j c future consumption J c r ja j j1 1 J max EU r ja j a 1 ,...,a J j1 J s.t. aj y j1 r r1, . . . , r J r| , N, assume: is known is unknown, must be estimated from Y Classical econometrician: Step 1: Solve optimization problem as if , known with certainty Step 2: Estimate from data Y Step 3: Plug results from Step 2 into Step 1 2 Example: Uc exp EU c max ar E exp exp a exp a c 2 2 a s.t. a 1 2 a a /2 a a 10 a 1 2 2 /2 a a a y 2 a /2 a y a1 1 1 1 3 Bayesian econometrician: Solve optimization problem under uncertainty posterior: |Y N m , M r N 0, r|Y N m , E U c |Y exp a 2 E exp a r |Y 2 am M M 1 m /2 a Ma 1 uncertainty about influences portfolio allocation decision (even if we have ) diffuse prior so that m Bayesian considers the statistical inference problem to be: calculate the posterior distribution How this distribution is used to come up with a “parameter estimate” requires specifying a loss function 4 I. Bayesian econometrics C. Statistical decision theory 1. Example: portfolio allocation problem 2. General decision theory unknown true value estimate , loss function how much we are concerned if we announce an estimate of but the truth is is solution to min , p |Y d where 5 Scalar examples: (1) quadratic loss 2 , Claim: optimal E |Y Proof: E 2 |Y E E E |Y |Y 2E E E |Y |Y |Y minimized at |Y E |Y 2 E |Y E |Y 2 2 E |Y 2 E |Y E |Y 2 E |Y Conclusion: for quadratic loss, optimal estimate is posterior mean 6 (2) absolute loss , | | Claim: optimal med p |Y d med 0. 5 Proof: | | p |Y d p |Y d p |Y d differentiating with respect to gives: differentiating with respect to gives: p |Y p |Y p |Y p |Y minimized when p |Y p |Y Conclusion: for absolute loss, optimal estimate is posterior median 7 (3) point loss (discrete case) 1, . . . , J , 0 if 1 if J j1 arg min j 1 P j for which P j |Y j |Y is highest Conclusion: for point loss, optimal estimate is posterior mode Returning to example from first lecture N , 2 ( known) yt| (prior) N m, 2 |Y Nm, 2 (posterior) 2 /T m 2 m 2 /T 2 2 m 2 /T 2 m 2 /T 2 2 /T y 2 /T 2 y for any of these three loss functions (quadratic, absolute, point), the estimate would be m diffuse prior: y 8 I. Bayesian econometrics C. Statistical decision theory 1. Example: portfolio allocation problem 2. General decision theory 3. Bayesian statistics and admissibility More generally, we can consider some action a we plan to take, e.g., a means we announce that our estimate is a 0 if we accept H 0 : 0 a 1 if we reject H 0 : 0 a, loss if we take the action a when the true value of the parameter turns out to be Definition: an action a is said to be inadmissible if there is an alternative action b such that a, b, for all with strict inequality for some 9 The Bayes decision implied by the probability distribution p is the action a for which a, p d is minimized Under certain regularity conditions: (1) If action a is the Bayes decision implied by p , then a is admissible (2) If action a is admissible, then there exists a p for which a is the Bayes decision Example: hypothesis testing a 1 (reject H 0 : 0 a 0 (accept H 0 ) 1, 0 if 0 1 if 0 0, 0 if 0 c if 0 10 Bayes action: choose a 1 if E 1, |Y E 0, |Y P c1 P 0 |Y 1/ 1 c P 0 |Y 0 |Y The hypothesis test reject H 0 if T Y t is said to be inadmissible if there exists an alternative test reject H 0 if S Y s such that: (1) for every TY t 0, p Y| dY (2) for every TY t SY s p Y| dY 0, p Y| dY SY s p Y| d Y (3) there is some for which the the inequality in either (1) or (2) is strict 11 I. Bayesian econometrics C. Statistical decision theory D. Large sample results Goal of this section: A Bayesian is doing something with the data. How would a classical econometrician describe what that is? I. Bayesian econometrics C. Statistical decision theory D. Large sample results 1. Background: The Kullback-Leibler information inequality 4 3 Claim: log x x 1 equality only if x 2 1 0 -1 1 -2 -3 -4 -5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 12 Implication: E log x E x 1 with equality only if x with probability 1 1 Application of claim to case of discrete parameter space and discrete random variables 1, . . . , J true value yt 1, . . . , I Define j PY yt | PY x yt , yt | j This is a random variable (because y t is random) that with probability takes on the value P Y i| P Y i| j P Y i| 13 E x yt, I PY PY i| i| PY i| i1 j j PY i| I j i1 1 E log x y t , I log i1 E log j PY PY p yt | p yt | i| i| j PY i| j The claim E log x E x 1 implies for this case that p yt | j E log p yt | 1 1 0 with equality only if p yt| j p yt| yt 14 Kullback-Leibler information inequality: p yt | E log 0 p yt | with equality only if I. Bayesian econometrics C. Statistical decision theory D. Large sample results 1. Background: The Kullback-Leibler information inequality 2. Implications of K-L for Bayesian posterior probabilities will illustrate how data eventually overwhelm any prior p p s |Y p J j1 p J j1 T t1 s p T t1 s p j exp log p J j1 p yt | T t1 j p Y| s p j p Y| s J j1 s p yt| j /p y t | s T t1 s exp log p s p yt | p yt| T t1 j j /p yt | log T t1 p yt | p yt | log s p yt | p yt | j 15 LLN: T T 1 log t1 which is p p yt | p yt | 0 if 0 if s p yt | p yt | E log s s s s |Y exp log p J j1 exp log p p s |Y T t1 s p log T t1 j 0 if log s p yt | p yt | j s 1 if p yt | p yt| s conclusion: Bayesian posterior distribution collapses to a spike at truth for i.i.d. discrete data I. Bayesian econometrics C. Statistical decision theory D. Large sample results 1. Background: The Kullback-Leibler information inequality 2. Implications of K-L for Bayesian posterior probabilities 3. Bayesian posterior distribution as approximation to asymptotic distribution of MLE 16 T log p Y| log p y t | t1 define arg max log p Y| T log p Y| 0 T log p Y| log p Y| log p Y| T T T 2 1 2 T log p Y| T T T T 2 Ht H 1 T E log p yt | 2 T log p y t | 17 log p Y| log p Y| 1T 2 T T T T 1 Ht T T T t1 log p Y| log p Y| T H p |Y k Tp exp H p qT for T 1T 2 T T T 1/2 T T T T qT kernel of N 0, H 1 density T 18 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 blue: p 3 4 5 6 7 green: q 10 8 9 10 red: q 100 posterior distributions 1.4 1.2 1 0.8 0.6 0.4 0.2 0 blue: T 0 1 2 3 4 5 0 green: T 6 7 8 9 10 10 red: T 100 Conclusions: the sequence of posterior distributions p |Y T has the property p |Y T p p 1 at 0 at 19 Let T be sequence of random variables with distribution p |Y T . Then conditional on Y T we have L 1 N 0, H TT T where distribution is across realizations of T Contrast with classical result: L 1 TT N 0, H where distribution is across realizations of Y T Implication: calculating the Bayesian posterior distribution is a way to find the asymptotic distribution of the MLE when regularity conditions hold 20 yt| 2 N, ( known) 2 (prior) 2 (posterior) N m, |Y N m , 2 /T m 2 m 2 /T 2 2 2 2 2 /T 2 2 2 2 /T 2 /T 2 yT 2 /T 2 /T Conditional on Y T , the variable |Y T has a distribution characterized by 1 mT N 0, 1 T T 2 T 2 1/2 /T 2 /T 2 1/2 mT T T mT N 0, 1 N 0, 1 As T T T yT N 0, 1 classical result: T yT N 0, 1 21 I. Bayesian econometrics C. Statistical decision theory D. Large sample results E. Diffuse priors Interpretations: (1) Start with finite , calculate posterior, and consider limiting properties of sequence as Interpretations: (2) Start with p limit as 1 2 exp ? m 2 2 2 is not a density 22 (3) Just use kernels? p Y| exp 2 2 2 2y /T p 1? (diffuse prior?) implies p |Y exp 2 2 2 2y /T |Y N y, 2 /T gives the correct answer in this case But p density for 1 is not a proper 1 p 1 is called an "improper" prior In this case, it gave us the correct answer. In other cases it can fail (with either analytical or numerical methods) 23 Another problem with the improper prior p 1 is that it is not invariant with respect to reparameterization. Example: T 1 1 2 p y1| ; 1 If parameter of interest is p1 1 then 1 |y p 2 and yt 2 1 exp 1; 2 yt 2 exp 2 2 The constant of proportionality needed to ensure p 1 |y 1; 0 p 1 y1 |y 1 ; d 2 exp 1 1 is y1 2 2 2 24 Suppose instead parameter of interest is taken to be 2 and prior is p 2 1 2 p p 1 |y 1 ; 2 1/2 2 y1 |y 1 ; 2 3/2 2 2 2 1/2 2 y1 2 exp (a yt 2 exp 2 3/2, y 1 2 2 /2 distribution) Problem: P 1 1|y 1 ; 1 P Issue: if w 1 2 p 2 |y 1 ; 1 p |y 1 ; d 1 2 d 1|y 1 ; then g g 1 w d 1 w dw Conclusion: the "improper priors" p1 1 and p 2 1 represent different prior beliefs 25 Question: which (if either) should be called a “diffuse prior” corresponding to complete uncertainty? Jeffreys prior: p 2 h 1/2 log p y| / 2 p y| dy 1 T/2 log 2 T t1 log p y| / p 2 T Example: if log p y| E log p y| T for y 2 1/2 h 2 log p y| / h 1/2 2 yt T/ T log 12 T t1 1 T/ 2 2 T p 1 2 2 yt T t1 2 T 1 2 yt 2T 2 1 26 2 If we instead take log p y| T/2 log 2 1/2 T t1 yt 2 T t1 log p y| / 2 E 2 log p y| / 2 h 2 2 1/2 p T/2 log T/ 2 log p y| / : 2 yt 4 T/2 2 1/2 2 T/2 p 2 2 Advantage of Jeffreys prior: Probabilities implied by p 1 derived from p to those implied by p 2 from p 2 4 1 |Y; are identical 2 |Y; derived Note: for the Normal-gamma prior /2 N/2 N/2 1 2 p2 N/2 exp 2 2 we characterized the diffuse prior as 0 or N 0, 2 p2 27 Concerns about Jeffreys prior: does not seem to represent "prior ignorance" in many examples My recommendation: 1 or Use improper prior p Jeffreys prior only for guidance, checking results, or in cases where operation is well understood. Use mildly informative prior to avoid all problems. 28 ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online