This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Dynamic Programming
Peter Ireland∗
EC720.01  Math for Economists
Boston College, Department of Economics
Fall 2010
We have now studied two ways of solving dynamic optimization problems, one based
on the KuhnTucker theorem and the other based on the maximum principle. These two
methods both lead us to the same sets of optimality conditions; they diﬀer only in terms of
how those optimality conditions are derived.
Here, we will consider a third way of solving dynamic optimization problems: the method
of dynamic programming. We will see, once again, that dynamic programming leads us to the
same set of optimality conditions that the KuhnTucker theorem does; once again, this new
method diﬀers from the others only in terms of how the optimality conditions are derived.
While the maximum principle lends itself equally well to dynamic optimization problems
set in both discrete time and continuous time, dynamic programming is easiest to apply in
discrete time settings. On the other hand, dynamic programing, unlike the KuhnTucker
theorem and the maximum principle, can be used quite easily to solve problems in which
optimal decisions must be made under conditions of uncertainty.
Thus, in our discussion of dynamic programming, we will begin by considering dynamic
programming under certainty; later, we will move on to consider stochastic dynamic programming.
References:
Dixit, Chapter 11.
Acemoglu, Chapters 6 and 16. 1
1.1 Dynamic Programming Under Certainty
A Perfect Foresight Dynamic Optimization Problem in Discrete Time No uncertainty
∗
c
Copyright 2010 by Peter Ireland. Redistribution is permitted for educational and research purposes,
so long as no changes are made. All copies must be provided free of charge and must include this copyright
notice. 1 Discrete time, inﬁnite horizon: t = 0, 1, 2, ...
yt = stock, or state, variable
zt = ﬂow, or control, variable
Objective function: ∞
β t F ( yt , zt ; t) t=0 1 > β > 0 discount factor
Constraint describing the evolution of the state variable
Q(yt , zt ; t) ≥ yt+1 − yt
or
yt + Q(yt , zt ; t) ≥ yt+1
for all t = 0, 1, 2, ...
Constraint applying to variables within each period:
c ≥ G( yt , z t ; t )
for all t = 0, 1, 2, ...
Constraint on initial value of the state variable:
y0 given
The problem: choose sequences {zt }∞ and {yt }∞ to maximize the objective function
t=0
t=1
subject to all of the constraints.
Notes:
a) It is important for the application of dynamic programming that the problem is
additively time separable: that is, the values of F , Q, and G at time t must
depend only on the values of yt and zt at time t.
b) Once again, it must be emphasized that although the constraints describing the
evolution of the state variable and that apply to the variables within each period
can each be written in the form of a single equation, these constraints must hold
for all t = 0, 1, 2, .... Thus, each equation actually represents an inﬁnite number
of constraints. 2 1.2 The KuhnTucker Formulation Let’s being our analysis of this problem by applying the KuhnTucker theorem. That is,
let’s begin by setting up the Lagrangian and taking ﬁrst order conditions.
Set up the Lagrangian, recognizing that the constraints must hold for all t = 0, 1, 2, ...:
∞
L= t β F ( yt , zt ; t ) + t=0 ∞
t=0 µt+1 [yt + Q(yt , zt ; t) − yt+1 ] +
˜ ∞
t=0 ˜
λt [c − G(yt , zt ; t)] It will be convenient to deﬁne
µt+1 = β −(t+1) µt+1 ⇒ µt+1 = β t+1 µt+1
˜
˜
˜
˜
λ t = β − t λt ⇒ λ t = β t λt ˜
and to rewrite the Lagrangian in terms of µt+1 and λt instead of µt+1 and λt :
˜
L= ∞
t β F ( yt , z t ; t ) + t=0 ∞
β t+1 t=0 µt+1 [yt + Q(yt , zt ; t) − yt+1 ] + ∞
t=0 β t λt [c − G(yt , zt ; t)] FOC for zt , t = 0, 1, 2, ...:
β t F2 (yt , zt ; t) + β t+1 µt+1 Q2 (yt , zt ; t) − β t λt G2 (yt , zt ; t) = 0
FOC for yt , t = 1, 2, 3, ...:
β t F1 (yt , zt ; t) + β t+1 µt+1 [1 + Q1 (yt , zt ; t)] − β t λt G1 (yt , zt ; t) − β t µt = 0
Now, let’s suppose that somehow we could solve for µt as a function of the state variable
yt :
µt = W ( y t ; t )
µt+1 = W (yt+1 ; t + 1) = W [yt + Q(yt , zt ; t); t + 1]
Then we could rewrite the FOC as:
F2 (yt , zt ; t) + β W [yt + Q(yt , zt ; t); t + 1]Q2 (yt , zt ; t) − λt G2 (yt , zt ; t) = 0 (1) W (yt ; t) = F1 (yt , zt ; t) + β W [yt + Q(yt , zt ; t); t + 1][1 + Q1 (yt , zt ; t)] − λt G1 (yt , zt ; t) (2)
And together with the binding constraint
yt+1 = yt + Q(yt , zt ; t) (3) and the complementary slackness condition
λt [c − G(yt , zt ; t)] = 0 (4) we can think of (1) and (2) as forming a system of four equations in three unknown
variables yt , zt , and λt and one unknown function W (:, t). This system of equations
determines the problem’s solution.
Note that since (3) is in the form of a diﬀerence equation, ﬁnding the problem’s solution
involves solving a diﬀerence equation.
3 1.3 An Alternative Formulation Now let’s consider the same problem in a slightly diﬀerent way.
For any given value of the initial state variable y0 , deﬁne the value function
v (y0 ; 0) = max { z t } ∞ ,{ y t } ∞
t=0
t=1 ∞
β t F ( yt , zt ; t ) t=0 subject to
y0 given
yt + Q(yt , zt ; t) ≥ yt+1 for all t = 0, 1, 2, ...
c ≥ G(yt , zt ; t) for all t = 0, 1, 2, ...
More generally, for any period t and any value of yt , deﬁne
v ( yt ; t) = max { z t +j } ∞ , { y t +j } ∞
j =0
j =1 ∞
β j F ( y t +j , z t + j ; t + j ) j =0 subject to
yt given
yt+j + Q(yt+j , zt+j ; t + j ) ≥ yt+j +1 for all j = 0, 1, 2, ...
c ≥ G(yt+j , zt+j ; t + j ) for all j = 0, 1, 2, ...
Note that the value function is a maximum value function.
Now consider expanding the deﬁnition of the value function by separating out the time t
components:
v (yt ; t) = max [F (yt , zt ; t) +
zt ,yt+1 max { z t +j } ∞ ,{ y t + j } ∞
j =1
j =2 ∞
β j F (yt+j , zt+j ; t + j )] j =1 subject to
yt given
yt + Q(yt , zt ; t) ≥ yt+1
yt+j + Q(yt+j , zt+j ; t + j ) ≥ yt+j +1 for all j = 1, 2, 3, ...
c ≥ G( yt , z t ; t )
c ≥ G(yt+j , zt+j ; t + j ) for all j = 1, 2, 3, ... 4 Next, relabel the time indices:
v (yt ; t) = max [F (yt , zt ; t) + β
zt ,yt+1 max {zt+1+j }∞ ,{yt+1+j }∞
j =0
j =1 ∞
β j F (yt+1+j , zt+1+j ; t + 1 + j )] j =0 subject to
yt given
yt + Q(yt , zt ; t) ≥ yt+1
yt+j +1 + Q(yt+1+j , zt+1+j ; t + 1 + j ) ≥ yt+1+j +1 for allj = 0, 1, 2, ...
c ≥ G( yt , z t ; t )
c ≥ G(yt+1+j , zt+1+j ; t + 1 + j ) for all j = 0, 1, 2, ...
Now notice that together, the components for t + 1 + j , j = 0, 1, 2, ... deﬁne v (yt+1 ; t + 1),
enabling us to simplify the statement considerably:
v (yt ; t) = max F (yt , zt ; t) + β v (yt+1 ; t + 1)
zt ,yt+1 subject to
yt given
yt + Q(yt , zt ; t) ≥ yt+1
c ≥ G( yt , z t ; t )
Or, even more simply:
v (yt ; t) = max F (yt , zt ; t) + β v [yt + Q(yt , zt ; t); t + 1]
zt (5) subject to
yt given
c ≥ G( y t , z t ; t )
Equation (5) is called the Bellman equation for this problem, and lies at the heart of the
dynamic programming approach.
Note that the maximization on the righthand side of (5) is a static optimization problem,
involving no dynamic elements.
By the KuhnTucker theorem:
v (yt ; t) = max F (yt , zt ; t) + β v [yt + Q(yt , zt ; t); t + 1] + λt [c − G(yt , zt ; t)]
zt The FOC for zt is
F2 (yt , zt ; t) + β v [yt + Q(yt , zt ; t); t + 1]Q2 (yt , zt ; t) − λt G2 (yt , zt ; t) = 0
5 (6) And by the envelope theorem:
v (yt ; t) = F1 (yt , zt ; t) + β v [yt + Q(yt , zt ; t); t + 1][1 + Q1 (yt , zt ; t)] − λt G1 (yt , zt ; t) (7)
Together with the binding constraint
yt+1 = yt + Q(yt , zt ; t) (3) and complementary slackness condition
λt [c − G(yt , zt ; t)] = 0, (4) we can think of (6) and (7) as forming a system of four equations in three unknown
variables yt , zt , and λt and one unknown function v (:, t). This system of equations
determines the problem’s solution.
Note once again that since (3) is in the form of a diﬀerence equation, ﬁnding the problem’s
solution involves solving a diﬀerence equation.
But more important, notice that (6) and (7) are equivalent to (1) and (2) with
v ( yt ; t) = W ( yt ; t ) .
Thus, we have two ways of solving this discrete time dynamic optimization problem, both
of which lead us to the same set of optimality conditions:
a) Set up the Lagrangian for the dynamic optimization problem and take ﬁrst order
conditions for zt , t = 0, 1, 2, ... and yt , t = 1, 2, 3, ....
b) Set up the Bellman equation and take the ﬁrst order condition for zt and then
derive the envelope condition for yt .
One question remains: How, in practice, can we solve for the unknown value functions
v (:, t)?
To see how to answer this question, consider two examples:
Example 1: Optimal Growth  Here, it will be possible to solve for v explicitly.
Example 2: Saving Under Certainty  Here, it will not be possible to solve for v
explicitly, yet we can learn enough about the properties of v to obtain some
useful economic insights. 2 Example 1: Optimal Growth Here, we will modify the optimal growth example that we solved earlier using the maximum
principle in two ways:
a) We will switch to discrete time in order to facilitate the use of dynamic programming.
6 b) Set the depreciation rate for capital equal to δ = 1 in order to obtain a very special
case in which an explicit solution for the value function can be found.
Production function:
α
F ( kt ) = kt where 0 < α < 1
kt = capital (state variable)
ct = consumption (control variable)
Evolution of the capital stock:
α
kt+1 = kt − ct for all t = 0, 1, 2, ...
Initial condition:
k0 given
Utility or social welfare: ∞
β t ln(ct ) t=0 The social planner’s problem: choose sequences {ct }∞ and {kt }∞ to maximize the utility
t=0
t=1
function subject to all of the constraints.
To solve this problem via dynamic programming, use
kt = state variable
ct = control variable
Set up the Bellman equation:
α
v (kt ; t) = max ln(ct ) + β v (kt − ct ; t + 1)
ct Now guess that the value function takes the timeinvariant form
v (kt ; t) = v (kt ) = E + F ln(kt ),
where E and F are constants to be determined.
Using the guess for v , the Bellman equation becomes
α
E + F ln(kt ) = max ln(ct ) + β E + β F ln(kt − ct )
ct FOC for ct : 1
βF
−α
=0
c t kt − c t
7 (8) (9) Envelope condition for kt : F
αβ F k α−1
= αt
kt
kt − c t (10) Together with the binding constraint
α
kt+1 = kt − ct , (8)(10) form a system of four equations in 4 unknowns: ct , kt , E , and F .
Equation (9) implies
α
kt − c t = β F c t or
ct = 1
1 + βF α
kt (11) Substitute (11) into the envelope condition (10):
F
αβ F k α−1
= αt
kt
kt − c t
1
α
α
α
F kt − F
kt = αβ F kt
1 + βF
1
1−
= αβ
1 + βF
Hence 1
= 1 − αβ
1 + βF Or, equivalently,
1 + βF =
βF = (10) (12) 1
1 − αβ 1
αβ
−1=
1 − αβ
1 − αβ
α
F=
1 − αβ (13) Substitute (12) into (11) to obtain
α
ct = (1 − αβ )kt (14) which shows that it is optimal to consume the ﬁxed fraction 1 − αβ of output.
Evolution of capital:
α
α
α
α
kt+1 = kt − ct = kt − (1 − αβ )kt = αβ kt which is in the form of a diﬀerence equation for kt . 8 (15) Equations (14) and (15) show how the optimal values of ct and kt+1 depend on the state
variable kt and the parameters α and β . Given a value for k0 , these two equations can
be used to construct the optimal sequences {ct }∞ and {kt }∞ .
t=0
t=1
For the sake of completeness, substitute (14) and (15) back into (8) to solve for E :
α
E + F ln(kt ) = max ln(ct ) + β E + β F ln(kt − ct )
ct (8) E + F ln(kt ) = ln(1 − αβ ) + α ln(kt ) + β E + β F ln(αβ ) + αβ F ln(kt )
Since (13) implies that
F = α + αβ F,
this last equality reduces to
E = ln(1 − αβ ) + β E + β F ln(αβ )
which leads directly to the solution
E= 3 ln(1 − αβ ) + αβ
1−αβ ln(αβ ) 1−β Example 2: Saving Under Certainty Here, a consumer maximizes utility over an inﬁnite horizon, t = 0, 1, 2, ..., earning income
from labor and from investments.
At = beginningofperiod assets
At can be negative, that is, the consumer is allowed to borrow
yt = labor income (exogenous)
ct = consumption
saving = st = At + yt − ct
r = constant interest rate
Evolution of assets:
At+1 = (1 + r)st = (1 + r)(At + yt − ct )
Note:
At + yt − ct =
At = 1
1+r 9 1
1+r At+1 At+1 + ct − yt Numerical Solutions to the Optimal Growth Model with Complete Depreciation
Generated using equations (14) and (15). Each example sets = 0.33 and = 0.99. Example 1: k(0) = 0.01
capital stock consumption
0.5 0.2 0.4 0.15 0.3 0.1 0.2 0.05 0.1 0 0
0 2 4 6 8 10 12 14 16 18 0 20 2 4 6 8 10 12 14 16 18 20 14 16 18 20 Example 2: k(0) = 1
consumption capital stock 0.8 1.2
1 0.6 0.8 0.4 0.6
0.4 0.2 0.2 0 0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 In both examples, c(t) converges to its steady state value of 0.388 and k(t) converges to its steadystate value of 0.188. Similarly,
At+1 = 1
1+r At+2 + ct+1 − yt+1 Combining these last two equalities yields
2
1
1
At =
At+2 +
(ct+1 − yt+1 ) + (ct − yt )
1+r
1+r
Continuing in this manner yields
At = 1
1+r T At + T T −1
1 j
+
( c t +j − y t + j ) .
1+r
j =0 Now assume that the sequence {At }∞ must remain bounded (while borrowing is allowed,
t=0
unlimited borrowing is ruled out), and take the limit as T → ∞ to obtain
∞
1 j
At =
( c t + j − y t +j )
1+r
j =0 or ∞
∞
1 j
1 j
At +
yt +j =
c t +j .
1+r
1+r
j =0
j =0 (16) Equation (16) takes the form of an inﬁnite horizon budget constraint, indicating that over
the inﬁnite horizon beginning at any period t, the consumer’s sources of funds include
assets At and the present value of current and future labor income, while the consumer’s
use of funds is summarized by the present value of current and future consumption.
The consumer’s problem: choose the sequences {st }∞ and {At }∞ to maximize the utility
t=0
t=1
function
∞
∞
β t u( c t ) =
β t u ( At + y t − s t )
t=0 t=0 subject to the constraints A0 given
and
(1 + r)st ≥ At+1
for all t = 0, 1, 2, ...
To solve the problem via dynamic programming, note ﬁrst that
At = state variable
st = control variable 10 Set up the Bellman equation
v (At ; t) = max u(At + yt − st ) + β v (At+1 ; t + 1) st(1 + r)st ≥ At+1
st v (At ; t) = max u(At + yt − st ) + β v [(1 + r)st ; t + 1]
st FOC for st : −u (At + yt − st ) + β (1 + r)v [(1 + r)st ; t + 1] = 0 Envelope condition for At : v (At ; t) = u (At + yt − st ) Use the constraints to rewrite these optimality conditions as
u (ct ) = β (1 + r)v (At+1 ; t + 1) (17) v ( At ; t ) = u ( c t ) (18) and
Since (18) must hold for all t = 0, 1, 2, ..., it implies
v (At+1 ; t + 1) = u (ct+1 )
Substitute this result into (17) to obtain:
u (ct ) = β (1 + r)u (ct+1 ) (19) Now make 2 extra assumptions:
a) β (1 + r) = 1 or 1 + r = 1/β , the interest rate equals the discount rate
b) u is strictly concave
Under these 2 additional assumptions, (19) implies
u (ct ) = u (ct+1 )
or
ct = ct+1
And since this last equation must hold for all t = 0, 1, 2, ..., it implies
ct = ct+j for all j = 0, 1, 2, ...
Now, return to (16): ∞
∞
1 j
1 j
At +
yt +j =
c t +j .
1+r
1+r
j =0
j =0
∞
∞
1 j
At +
y t +j = c t
βj
1+r
j =0
j =0 11 (16) (20) FACT: Since β  < 1, ∞
βj = j =0 1
1−β To see why this is true, multiply both sides by 1 − β :
1= 1−β
1−β = (1 − β ) ∞
βj j =0
2 = (1 + β + β + ...) − β (1 + β + β 2 + ...)
= (1 + β + β 2 + ...) − (β + β 2 + β 3 + ...)
=1
Use this fact to rewrite (20):
∞
1 j
1
At +
y t +j =
ct
1+r
1−β
j =0
or ∞
1 j
ct = (1 − β ) At +
y t +j
1+r
j =0 (21) Equation (21) indicates that it is optimal to consume a ﬁxed fraction 1 − β of wealth at each
date t, where wealth consists of value of current asset holdings and the present discounted value of future labor income. Thus, (21) describes a version of the permanent
income hypothesis. 4
4.1 Stochastic Dynamic Programming
A Dynamic Stochastic Optimization Problem Discrete time, inﬁnite horizon: t = 0, 1, 2, ...
yt = state variable
zt = control variable
εt+1 = random shock, which is observed at the beginning of t + 1
Thus, when zt is chosen:
εt is known ...
... but εt+1 is still viewed as random.
12 The shock εt+1 may be serially correlated, but will be assumed to have the Markov property
(i.e., to be generated by a Markov process): the distribution of εt+1 depends on εt , but
not on εt−1 , εt−2 , εt−3 , ....
For example, εt+1 may follow a ﬁrstorder autoregressive process:
εt+1 = ρεt + ηt+1 .
Now, the full state of the economy at the beginning of each period is described jointly by
the pair of values for yt and εt , since the value for εt is relevant for forecasting, that is,
forming expectations of, future values of εt+j , j = 1, 2, 3, ....
Objective function:
E0 ∞
β t F (yt , zt , εt ) t=0 1 > β > 0 discount factor
E0 = expected value as of t = 0
Constraint describing the evolution of the state variable
yt + Q(yt , zt , εt+1 ) ≥ yt+1
for all t = 0, 1, 2, ... and for all possible realizations of εt+1
Note that the constraint implies that the randomness in εt+1 induces randomness into yt+1
as well: in particular, the value of yt+1 does not become known until εt+1 is observed
at the beginning of t + 1 for all t = 0, 1, 2, ...
Note, too, that the sequential, periodbyperiod, revelation of values for εt , t = 0, 1, 2, ...,
also generates sequential growth in the information set available to the agent solving
the problem:
At the beginning of period t = 0, the agent knows
I0 = {y0 , ε0 }
At the beginning of period t = 1, the agent’s information set expands to
I1 = {y1 , ε1 , y0 , ε0 }
And, more generally, at the beginning of period t = 0, 1, 2, ..., the agent’s information
set is given by
It = {yt , εt , yt−1 , εt−1 , ..., y0 , ε0 } so that conditional expectations of future variables are deﬁned implicitly with
respect to this growing information set: for any variable Xt+j whose value becomes
known at time t + j , j = 0, 1, 2, ...:
Et Xt+j = E (Xt+j It ) = E (Xt+j yt , εt , yt−1 , εt−1 , ..., y0 , ε0 )
13 The role of the additive time separability of the objective function, the similar “additive time separability” that is built into the constraints, and the Markov property
of the shocks is to make the most recent values of yt and εt suﬃcient statistics for
It , so that within the conﬁnes of this problem,
Et (Xt+j yt , εt , yt−1 , εt−1 , ..., y0 , ε0 ) = Et (Xt+j yt , εt ).
Note, ﬁnally, that the randomness in yt+1 induced by the randomness in εt+1 also introduces
randomness into the choice of zt+1 from the perspective of time t:
Given (yt , εt ), choose zt
⇒ Given (yt , zt ) the realization of εt+1 determines (yt+1 , εt+1 )
⇒ Given (yt+1 , εt+1 ), choose zt+1
This makes the number of choice variables, as well as the number of constraints, quite
large.
The problem: choose contingency plans for zt , t = 0, 1, 2, ..., and yt , t = 1, 2, 3, ..., to
maximize the objective function subject to all of the constraints.
Notes:
a) In order to incorporate uncertainty, we have really only made two adjustments to
the problem:
First, we have added the shock εt to the objective function for period t and the
shock εt+1 to the constraint linking periods t and t + 1.
And second, we have assumed that the planner cares about the expected value
of the objective function.
b) For simplicity, the functions F and Q are now assumed to be timeinvariant,
although now they depend on the shock as well as on the state and control variable.
c) For simplicity, we have also dropped the second set of constraints, c ≥ G(yt , zt ).
Adding them back is straightforward, but complicates the algebra.
d) In the presence of uncertainty, the constraint
yt + Q(yt , zt , εt+1 ) ≥ yt+1
must hold, not only for all t = 0, 1, 2, ..., but for all possible realizations of εt+1
as well. Thus, this single equation can actually represent a very large number of
constraints.
e) The KuhnTucker theorem can still be used to solve problems that feature uncertainty. But because problems with uncertainty can have a very large number
of choice variables and constraints, the KuhnTucker theorem can become very
diﬃcult to apply in practice, since one may have to introduce a very large number
of Lagrange multipliers. Dynamic programming, therefore, can be an easier and
more convenient way to solve dynamic stochastic optimization problems.
14 4.2 The Dynamic Programming Formulation Once again, for any values of y0 and ε0 , deﬁne
v (y0 , ε0 ) = max
∞ {zt }t=0 ,{yt }∞
t=1 E0 ∞
β t F (yt , zt , εt ) t=0 subject to
y0 and ε0 given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for allt = 0, 1, 2, ... and all εt+1
More generally, for any period t and any values of yt and εt , deﬁne
v (yt , εt ) = max { z t + j } ∞ ,{ yt + j } ∞
j =0
j =1 Et ∞
β j F (yt+j , zt+j , εt+j ) j =0 subject to
yt and εt given
yt+j + Q(yt+j , zt+j , εt+j +1 ) ≥ yt+j +1 for allj = 0, 1, 2, ... and all εt+j +1
Note once again that the value function is a maximum value function.
Now separate out the time t components:
v (yt , εt ) = max [F (yt , zt , εt ) +
zt ,yt+1 max { z t +j } ∞ ,{ y t +j } ∞
j =1
j =2 Et ∞
β j F (yt+j , zt+j , εt+j )] j =1 subject to
yt and εt given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all εt+1 yt+j + Q(yt+j , zt+j , εt+j +1 ) ≥ yt+j +1 for allj = 1, 2, 3, ... and all εt+j +1
Relabel the time indices:
v (yt , εt ) = max [F (yt , zt , εt ) + β
zt ,yt+1 max {zt+1+j }∞ ,{yt+1+j }∞
j =0
j =1 Et ∞
β j F (yt+1+j , zt+1+j , εt+1+j )] j =0 subject to
yt and εt given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all εt+1 yt+j +1 + Q(yt+1+j , zt+1+j , εt+1+j +1 ) ≥ yt+1+j +1 for all j = 0, 1, 2, ... and all εt+1+j +1
FACT (Law of Iterated Expectations): For any random variable Xt+j , realized at time t + j ,
j = 0, 1, 2, ...:
Et Et+1 Xt+j = Et Xt+j .
15 To see why this fact holds true, consider the following example:
Suppose εt+1 follows the ﬁrstorder autoregression:
εt+1 = ρεt + ηt+1 , with Et ηt+1 = 0
Hence
εt+2 = ρεt+1 + ηt+2 , with Et+1 ηt+2 = 0
or
εt+2 = ρ2 εt + ρηt+1 + ηt+2 .
It follows that
Et+1 εt+2 = Et+1 (ρ2 εt + ρηt+1 + ηt+2 ) = ρ2 εt + ρηt+1
and therefore
Et Et+1 εt+2 = Et (ρ2 εt + ρηt+1 ) = ρ2 εt .
It also follows that
Et εt+2 = Et (ρ2 εt + ρηt+1 + ηt+2 ) = ρ2 εt .
So that in this case as in general
Et Et+1 εt+2 = Et εt+2
Using this fact:
v (yt , εt )
= max [F (yt , zt , εt ) + β zt ,yt+1 max {zt+1+j }∞ ,{yt+1+j }∞
j =0
j =1 Et Et+1 ∞
β j F (yt+1+j , zt+1+j , εt+1+j )] j =0 subject to
yt and εt given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all εt+1 yt+j +1 + Q(yt+1+j , zt+1+j , εt+1+j +1 ) ≥ yt+1+j +1 for all j = 0, 1, 2, ... and all εt+1+j +1
Now use the deﬁnition of v (yt+1 , εt+1 ) to simplify:
v (yt , εt ) = max F (yt , zt , εt ) + β Et v (yt+1 , εt+1 )
zt ,yt+1 subject to
yt and εt given
yt + Q(yt , zt , εt+1 ) ≥ yt+1 for all εt+1
Or, even more simply:
v (yt , εt ) = max F (yt , zt , εt ) + β Et v [yt + Q(yt , zt , εt+1 ), εt+1 ]
zt 16 (22) Equation (22) is the Bellman equation for this stochastic problem.
Thus, in order to incorporate uncertainty into the dynamic programming framework, we
only need to make two modiﬁcations to the Bellman equation:
a) Include the shock εt as an additional argument of the value function.
b) Add the expectation term Et in front of the value function for t+1 on the righthand
side.
Note that the maximization on the righthand side of (22) is a static optimization problem,
involving no dynamic elements.
Note also that by substituting the constraints into the value function, we are left with
an unconstrained problem. Unlike the KuhnTucker approach, which requires many
constraints and many multipliers, dynamic programming in this case has no constraints
and no multipliers.
The FOC for zt is
F2 (yt , zt , εt ) + β Et {v1 [yt + Q(yt , zt , εt+1 ), εt+1 ]Q2 (yt , zt , εt+1 )} = 0 (23) The envelope condition for yt is:
v1 (yt , εt ) = F1 (yt , zt , εt ) + β Et {v1 [yt + Q(yt , zt , εt+1 ), εt+1 ][1 + Q1 (yt , zt , εt+1 )]} (24)
Equations (23)(24) coincide exactly with the ﬁrstorder conditions for zt and yt that we
would have derived through a direct application of the KuhnTucker theorem to the
original, dynamic stochastic optimization problem.
Together with the binding constraint
yt+1 = yt + Q(yt , zt , εt+1 ) (25) we can think of (23) and (24) as forming a system of three equations in two unknown
variables yt and zt and one unknown function v . This system of equations determines
the problem’s solution, given the behavior of the exogenous shocks εt .
Note that (25) is in the form of a diﬀerence equation; once again, solving a dynamic
optimization problem involves solving a diﬀerence equation. 5 Example 3: Saving with Multiple Random Returns This example extends example 2 by:
a) Introducing n ≥ 1 assets b) Allowing returns on each asset to be random
17 As in example 2, we will not be able to solve explicitly for the value function, but we will
be able to learn enough about its properties to derive some useful economic results.
Since we are extending the example in two ways, assume for simplicity that the consumer
receives no labor income, and therefore must ﬁnance all of his or her consumption by
investing.
At = beginningofperiod ﬁnancial wealth
ct = consumption
sit = savings allocated to asset i = 1, 2, ..., n
Hence,
At = c t + n
sit i=1 Rit+1 = random gross return on asset i, not known until t + 1
Hence, when sit is chosen:
Rit is known ...
... but Rit+1 is still viewed as random.
Hence
At+1 = n
Rit+1 sit i=1 does not become known until the beginning of t +1, even though the sit must be chosen
during t.
Utility:
E0 ∞
t β u( ct ) = E 0 t=0 ∞
t=0 t β u ( At − n
sit ) i=1 The problem can now be stated as: choose contingency plans for sit for all i = 1, 2, ..., n
and t = 0, 1, 2, ... and At for all t = 1, 2, 3, ... to maximize
E0 ∞
t=0 t β u ( At − n
sit ) i=1 subject to
A0 given
and n
i=1 Rit+1 sit ≥ At+1 for all t = 0, 1, 2, ... and all possible realizations of Rit+1 for each i = 1, 2, ..., n.
18 As in the general case, the returns can be serially correlated, but must have the Markov
property.
To solve this problem via dynamic programming, let
At = state variable
sit , i = 1, 2, ...n = control variables
Rt = [R1t , R2t , ...Rnt ] = vector of random returns
The Bellman equation is
v (At , Rt ) = max u(At −
n
{sit }i=1 FOC:
−u (At −
for all i = 1, 2, ..., n n
n
sit ) + β Et v ( i=1 n
Rit+1 sit , Rt+1 ) i=1 sit ) + β Et Rit+1 v1 ( i=1 n
Rit+1 sit , Rt+1 ) = 0 i=1 Envelope condition:
v1 (At , Rt ) = u (At − n
sit ) i=1 Use the constraints to rewrite the FOC and envelope conditions more simply as
u (ct ) = β Et Rit+1 v1 (At+1 , Rt+1 )
for all i = 1, 2, ..., n and
v1 (At , Rt ) = u (ct )
Since the envelope condition must hold for all t = 0, 1, 2, ..., it implies
v1 (At+1 , Rt+1 ) = u (ct+1 )
Hence, the FOC imply that
u (ct ) = β Et Rit+1 u (ct+1 ) (26) must hold for all i = 1, 2, ..., n
Equation (26) generalizes (19) to the case where there is more than one asset and where
the asset returns are random. It must hold for all assets i = 1, 2, ..., n, even though
each asset may pay a diﬀerent return expost.
In example 2, we combined (19) with some additional assumptions to derive a version of
the permanent income hypothesis. Similarly, we can use (26) to derive a version of the
famous capital asset pricing model.
19 For simplicity, let
mt+1 β u (ct+1 )
=
u ( c t ) denote the consumer’s intertemporal marginal rate of substitution.
Then (26) can be written more simply as
1 = Et Rit+1 mt+1 (27) Keeping in mind that (27) must hold for all assets, suppose that there is a riskfree asset,
f
f
with return Rt+1 that is known during period t. Then Rt+1 must satisfy
f
1 = Rt+1 Et mt+1 or
Et mt+1 = 1
f
Rt+1 (28) FACT: For any two random variables x and y ,
cov (x, y ) = E [(x − µx )(y − µy )], where µx = E (x) and µy = E (y ).
Hence,
cov (x, y ) =
=
=
= E [xy − µx y − xµy + µx µy ]
E (xy ) − µx µy − µx µy + µx µy
E (xy ) − µx µy
E (xy ) − E (x)E (y ) Or, by rearranging,
E (xy ) = E (x)E (y ) + cov (x, y )
Using this fact, (27) can be rewritten as
1 = Et Rit+1 mt+1 = Et Rit+1 Et mt+1 + covt (Rit+1 , mt+1 )
or, using (28), f
f
Rt+1 = Et Rit+1 + Rt+1 covt (Rit+1 , mt+1 )
f
f
Et Rit+1 − Rt+1 = −Rt+1 covt (Rit+1 , mt+1 ) (29) Equation (29) indicates that the expected return on asset i exceeds the riskfree rate only
if Rit+1 is negatively correlated with mt+1 .
Does this make sense? 20 Consider that an asset that acts like insurance pays a high return Rit+1 during bad
economic times, when consumption ct+1 is low. Therefore, for this asset:
covt (Rit+1 , ct+1 ) < 0 ⇒ covt [Rit+1 , u (ct+1 )] > 0
⇒ covt (Rit+1 , mt+1 ) > 0
f
⇒ Et Rit+1 < Rt+1 .
This implication seems reasonable: assets that work like insurance often have
expected returns below the riskfree return.
Consider that common stocks tend to pay a high return Rit+1 during good economic
times, when consumption ct+1 is high. Therefore, for stocks:
covt (Rit+1 , ct+1 ) > 0 ⇒ covt [Rit+1 , u (ct+1 )] < 0
⇒ covt (Rit+1 , mt+1 ) < 0
f
⇒ Et Rit+1 > Rt+1 .
This implication also seems to hold true: historically, stocks have had expected
returns above the riskfree return.
Recalling once more that (29) must hold for all assets, consider in particular the asset whose
return happens to coincide exactly with the representative consumer’s intertemporal
marginal rate of substitution:
m
Rt+1 = mt+1 .
For this asset, equation (29) implies
f
f
m
m
Et Rt+1 − Rt+1 = −Rt+1 covt (Rt+1 , mt+1 ) or f
f
f
Et mt+1 − Rt+1 = −Rt+1 covt (mt+1 , mt+1 ) = −Rt+1 vart (mt+1 )
f
−Rt+1 = f
Et mt+1 − Rt+1
vart (mt+1 ) (30) Substitute (30) into the righthand side of (29) to obtain
f
Et Rit+1 − Rt+1 = or
where covt (Rit+1 , mt+1 )
f
(Et mt+1 − Rt+1 )
vart (mt+1 ) f
f
Et Rit+1 − Rt+1 = bit (Et mt+1 − Rt+1 ), (31) covt (Rit+1 , mt+1 )
vart (mt+1 )
is like the slope coeﬃcient from a regression of Rit+1 on mt+1 .
bit = Equation (31) is a statement of the consumptionbased capital asset pricing model, or
consumption CAPM. This model links the expected return on each asset to the riskfree rate and the representative consumer’s intertemporal marginal rate of substitution.
21 ...
View
Full
Document
This note was uploaded on 02/29/2012 for the course ECON 720 taught by Professor Ireland during the Fall '09 term at BC.
 Fall '09
 IRELAND
 Economics

Click to edit the document details