Chapter 0
Discrete Time Dynamic
Programming
0.1
The Finite Horizon Case
Time is discrete and indexed by
t
=0
;
1
;:::;T
,where
T<
1
. An individual
is interested in maximizing an objective function given by
E
0
T
X
t
=0
¯
t
u
(
x
t
;a
t
)
;
(0.1)
where the instantaneous return function (or utility function)
u
depends on
a
state variable
x
t
and a
control variable
a
t
at each date. We assume that
x
t
2
X
½R
m
,
a
t
2
A
½R
n
for all
t
,
¯
2
(0
;
1)
,and
E
0
denotes the
expectation conditional on information available at date
0
.T
h
en
a
t
u
r
eo
f
the uncertainty in the environment will be made explicit below.
There are several constraints faced by the individual. At each date the
control variable is constrained to belong to a set that may depend on the
state:
a
t
2
¡(
x
t
)
for all
t
. The initial value of the state
x
0
is given by nature.
Future states evolve according to the
transition equation
x
t
+1
=
f
(
x
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
"
t
is a random variable, unobservable at date
t
, described by a cu
mulative distribution function that may depend on the state and action but
does not depend on
t
F
(
"
j
x; a
)=
Prob
(
"
t
·
"
j
x
t
=
x; a
t
=
a
)
:
(0.3)
Given any sequence of controls
f
a
t
g
, we can construct the probability distrib
ution of future states conditional on
x
0
from (0.2) and (0.3); the expectation
E
0
in (0.1) is with respect to this distribution.
1
Notice that the environment here is
stationary
, in the sense that
u
,
¡
,
f
and
F
do not depend on
t
. It would be a straightforward generalization to
allow these objects to vary with time, but we adopt a stationary environment
because it yields timeinvariant decision rules in the in…nite horizon case
stud
iedbe
low
.Inanycase
,
(
x
t
;a
t
)
contains all of the information available
at date
t
that is relevant for the probability distribution of future events.
This, together with the additive separability of the objective function in
(0.1), implies that the control at date
t
will depend only on the current
state,
a
t
=
®
t
(
x
t
)
,where
®
t
is referred to as the
decision rule
.
A
policy
(of length
T
) is de…ned to be a sequence of decision rules,
¼
T
=
(
®
0
;®
1
;:::;®
T
)
,where
®
t
:
X
!
A
for all
t
. The set of feasible policies is
¦
T
=
f
¼
T
=(
®
0
;®
1
;:::;®
T
):
®
t
(
x
)
2
¡(
x
)
8
x;t
g
:
(0.4)
A policy is stationary if it does not depend upon time:
®
t
(
x
)
´
®
(
x
)
.An
y
given policy generates a stochastic law of motion for the state,
x
t
+1
=
f
[
x
t
;®
t
(
x
t
)
;"
t
]
;
which will be stationary if
®
t
is stationary.
1
Alternatively, the uncertainty can be described by starting with a Markov probability
transition function,
Q
(
x;a;
~
X
)=Pr(
x
t
+1
2
~
X
j
x
t
=
x;a
t
=
a
)
;
for
~
X
½
X
, as in Stokey, Lucas and Prescott (1989). One can always construct such a
Q
from our
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '02
 Krueger
 Metric space, Stokey

Click to edit the document details