Unformatted text preview: I. Bayesian econometrics
A. Introduction
B. Bayesian inference in the univariate
regression model
C. Statistical decision theory
1. Example: portfolio allocation problem aj quantity of asset j purchased j 1, . . . , J
y income
budget constraint:
J aj y j1 rj gross rate of return on asset j c future consumption
J c r ja j
j1 1 J max EU r ja j a 1 ,...,a J j1
J s.t. aj y j1 r
r1, . . . , r J
r ,
N,
assume:
is known
is unknown, must be
estimated from Y Classical econometrician:
Step 1: Solve optimization problem
as if , known with certainty
Step 2: Estimate from data Y
Step 3: Plug results from Step 2
into Step 1 2 Example:
Uc exp EU c max ar E exp
exp a exp a c
2 2 a s.t. a 1 2 a
a /2 a a
10 a
1 2
2 /2 a a a y 2 a /2 a y a1 1
1 1 3 Bayesian econometrician:
Solve optimization problem
under uncertainty
posterior:
Y N m , M
r
N 0,
rY N m , E U c Y
exp
a 2 E exp a r Y
2 am
M M 1 m /2 a Ma
1 uncertainty about influences portfolio
allocation decision (even if we have
)
diffuse prior so that m Bayesian considers the statistical inference
problem to be: calculate the posterior
distribution
How this distribution is used to come up with
a “parameter estimate” requires
specifying a loss function 4 I. Bayesian econometrics
C. Statistical decision theory
1. Example: portfolio allocation problem
2. General decision theory unknown true value
estimate
, loss function
how much we are concerned
if we announce an estimate of
but the truth is is solution to
min , p Y d where 5 Scalar examples:
(1) quadratic loss
2
,
Claim: optimal
E Y Proof:
E 2 Y E
E E Y Y 2E
E E Y Y Y minimized at Y E Y
2 E Y E Y 2 2 E Y 2 E Y
E Y 2 E Y Conclusion: for quadratic loss,
optimal estimate is posterior mean 6 (2) absolute loss
,


Claim: optimal
med p Y d med 0. 5 Proof:
  p Y d p Y d
p Y d
differentiating with respect to gives: differentiating with respect to gives: p Y p Y
p Y p Y minimized when
p Y p Y Conclusion: for absolute loss,
optimal estimate is posterior median 7 (3) point loss (discrete case)
1, . . . , J
,
0 if
1 if
J
j1 arg min
j 1 P j for which P j Y j Y is highest
Conclusion: for point loss,
optimal estimate is posterior mode Returning to example from first lecture
N , 2 ( known)
yt
(prior)
N m, 2
Y Nm, 2 (posterior)
2 /T m 2 m 2 /T
2
2 m 2 /T
2 m 2 /T
2
2 /T y 2 /T 2 y for any of these three loss functions
(quadratic, absolute, point), the
estimate would be m
diffuse prior:
y 8 I. Bayesian econometrics
C. Statistical decision theory
1. Example: portfolio allocation problem
2. General decision theory
3. Bayesian statistics and admissibility More generally, we can consider
some action a we plan to take, e.g.,
a
means we announce that
our estimate is
a 0 if we accept H 0 :
0
a 1 if we reject H 0 :
0
a,
loss if we take the action
a when the true value of the
parameter turns out to be Definition: an action a is said to
be inadmissible if there is an
alternative action b such that
a,
b, for all with
strict inequality for some 9 The Bayes decision implied by
the probability distribution p
is the action a for which a, p d is minimized Under certain regularity conditions:
(1) If action a is the Bayes decision
implied by p , then a is admissible
(2) If action a is admissible, then
there exists a p for which a is
the Bayes decision Example: hypothesis testing
a 1 (reject H 0 :
0
a 0 (accept H 0 )
1,
0 if
0
1 if
0
0,
0 if
0
c if
0 10 Bayes action: choose a 1 if
E 1, Y
E 0, Y
P
c1 P
0 Y
1/ 1 c
P
0 Y 0 Y The hypothesis test
reject H 0 if T Y
t
is said to be inadmissible if there
exists an alternative test
reject H 0 if S Y
s
such that: (1) for every
TY t 0, p Y dY (2) for every
TY t SY s p Y dY 0, p Y dY SY s p Y d Y (3) there is some for which the
the inequality in either (1) or (2)
is strict 11 I. Bayesian econometrics
C. Statistical decision theory
D. Large sample results
Goal of this section:
A Bayesian is doing something with the
data. How would a classical
econometrician describe what that is? I. Bayesian econometrics
C. Statistical decision theory
D. Large sample results
1. Background: The KullbackLeibler
information inequality 4 3 Claim:
log x x 1
equality only if x 2 1 0 1 1 2 3 4 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 12 Implication:
E log x E x 1
with equality only if x
with probability 1 1 Application of claim to case
of discrete parameter space
and discrete random variables
1, . . . , J
true value
yt
1, . . . , I Define
j PY yt  PY x yt , yt  j This is a random variable (because y t is
random) that with probability
takes on the value
P Y i
P Y i
j
P Y i 13 E x yt, I PY
PY i
i PY i i1 j
j PY i I
j i1 1 E log x y t , I log
i1 E log j PY
PY
p yt 
p yt  i
i j PY i j The claim
E log x E x 1
implies for this case that
p yt  j
E
log
p yt  1 1 0 with equality only if
p yt j
p yt
yt 14 KullbackLeibler information inequality:
p yt 
E
log
0
p yt 
with equality only if I. Bayesian econometrics
C. Statistical decision theory
D. Large sample results
1. Background: The KullbackLeibler
information inequality
2. Implications of KL for Bayesian
posterior probabilities
will illustrate how data eventually overwhelm
any prior p p s Y p
J
j1 p
J
j1 T
t1 s p T
t1 s p j exp log p
J
j1 p yt 
T
t1 j p Y s
p j p Y
s J
j1 s p yt j /p y t 
s
T
t1 s exp log p s p yt  p yt
T
t1 j j /p yt 
log
T
t1 p yt 
p yt  log s p yt 
p yt  j 15 LLN:
T T 1 log
t1 which is p p yt 
p yt 
0 if
0 if s p yt 
p yt  E log s s
s s Y exp log p
J
j1 exp log p
p s Y T
t1 s p log
T
t1 j 0 if log s p yt 
p yt  j s 1 if p yt 
p yt s conclusion: Bayesian posterior
distribution collapses to a spike
at truth for i.i.d. discrete data I. Bayesian econometrics
C. Statistical decision theory
D. Large sample results
1. Background: The KullbackLeibler
information inequality
2. Implications of KL for Bayesian
posterior probabilities
3. Bayesian posterior distribution as
approximation to asymptotic distribution
of MLE 16 T log p Y log p y t 
t1 define
arg max log p Y T log p Y 0
T log p Y
log p Y log p Y T T
T 2 1
2 T log p Y T
T T T 2 Ht
H 1 T E log p yt  2 T log p y t  17 log p Y log p Y 1T
2 T T T T 1 Ht T T T t1 log p Y log p Y T H p Y k Tp exp H
p
qT
for T 1T
2 T T T 1/2 T T T T qT kernel of N 0, H 1 density T 18 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 blue: p 3 4 5 6 7 green: q 10 8 9 10 red: q 100 posterior distributions
1.4 1.2 1 0.8 0.6 0.4 0.2 0 blue: T 0 1 2 3 4 5 0 green: T 6 7 8 9 10 10 red: T 100 Conclusions: the sequence of posterior
distributions p Y T has the property
p Y T p
p 1 at
0 at 19 Let T be sequence of random
variables with distribution p Y T .
Then conditional on Y T we have
L 1
N 0, H
TT
T
where distribution is across
realizations of T Contrast with classical result:
L 1
TT
N 0, H
where distribution is across
realizations of Y T Implication: calculating the Bayesian
posterior distribution is a way to find the
asymptotic distribution of the MLE when
regularity conditions hold 20 yt 2 N, ( known) 2 (prior)
2
(posterior) N m,
Y N m , 2 /T m 2 m 2 /T
2
2 2 2 2 /T
2 2 2 2 /T
2 /T 2 yT 2 /T
2 /T Conditional on Y T , the variable Y T
has a distribution characterized by
1
mT
N 0, 1
T
T 2 T 2 1/2 /T 2 /T 2 1/2 mT T T mT N 0, 1 N 0, 1 As T
T T yT N 0, 1 classical result:
T yT N 0, 1 21 I. Bayesian econometrics
C. Statistical decision theory
D. Large sample results
E. Diffuse priors Interpretations:
(1) Start with finite , calculate
posterior, and consider limiting
properties of sequence as Interpretations:
(2) Start with
p
limit as 1
2 exp ?
m
2 2 2 is not a density 22 (3) Just use kernels?
p Y exp 2 2 2 2y
/T p
1?
(diffuse prior?) implies
p Y exp 2 2 2 2y
/T Y N y, 2 /T
gives the correct answer in
this case But p
density for 1 is not a proper
1 p
1 is called an "improper" prior
In this case, it gave us the correct answer.
In other cases it can fail (with either
analytical or numerical methods) 23 Another problem with the improper prior
p
1 is that it is not invariant with
respect to reparameterization. Example: T 1
1
2 p y1 ; 1 If parameter of interest is
p1
1 then
1 y p 2 and yt
2 1 exp 1; 2 yt
2 exp 2
2 The constant of proportionality needed
to ensure
p 1 y 1; 0 p 1 y1 y 1 ; d
2 exp 1 1 is
y1
2 2
2 24 Suppose instead parameter of interest
is taken to be 2 and prior is p 2
1
2 p
p 1 y 1 ; 2 1/2 2 y1 y 1 ; 2 3/2 2
2 2 1/2 2
y1
2 exp
(a yt
2 exp 2 3/2, y 1 2
2 /2 distribution) Problem:
P 1 1y 1 ;
1 P Issue: if
w 1
2 p
2 y 1 ; 1 p y 1 ; d 1 2 d 1y 1 ; then g
g 1 w d 1 w dw Conclusion: the "improper priors"
p1
1 and p 2
1
represent different prior beliefs 25 Question: which (if either) should be called a
“diffuse prior” corresponding to complete
uncertainty? Jeffreys prior:
p
2 h 1/2
log p y /
2 p y dy 1 T/2 log 2
T
t1 log p y / p 2 T Example: if
log p y E log p y T for y 2 1/2 h 2 log p y /
h 1/2 2 yt T/ T log
12
T
t1 1 T/
2 2 T
p 1 2 2 yt
T
t1
2 T 1
2 yt
2T 2 1 26 2 If we instead take
log p y
T/2 log 2
1/2 T
t1 yt 2 T
t1 log p y / 2
E 2 log p y / 2 h 2 2 1/2 p T/2 log T/ 2 log p y / : 2 yt
4 T/2
2 1/2 2 T/2
p 2 2 Advantage of Jeffreys prior:
Probabilities implied by p
1 derived from p
to those implied by p
2
from p 2 4 1 Y; are identical
2
Y; derived Note: for the Normalgamma prior
/2 N/2
N/2 1
2
p2
N/2
exp 2 2 we characterized the diffuse prior as
0 or
N 0,
2
p2 27 Concerns about Jeffreys prior:
does not seem to represent
"prior ignorance" in many examples My recommendation:
1 or
Use improper prior p
Jeffreys prior only for guidance,
checking results, or in cases where
operation is well understood.
Use mildly informative prior to
avoid all problems. 28 ...
View
Full Document
 Winter '09
 JamesHamilton
 Econometrics, Bayesian probability, Loss function, I. Bayesian econometrics

Click to edit the document details