L. Karp
Notes for Dynamics
V. Dynamic Programming
1) The basic idea of dynamic programming (discrete time).
2) The linear quadratic (LQ) discrete time control problem with additive errors.
3) Two problems related to the LQ problem.
4) Derivation of continuous time dynamic programming equation (DPE).
5) Relation between DP and maximum principle.
6) DPE for autonomous problem.
7) Writing optimal control rule as an ODE in the state.
8) LQ continuous time problem.
1. The basic idea
of using dynamic programming is to convert a dynamic problem into a
succession of static problems. We can illustrate this using the following problem:
Choose a sequence of u
t
, i.e. {u}
T
t=1
, to max 
Σ
t
i=1
(x
2
i
+u
2
i
) subject to x
i
=x
i1
i
,x
0
given.
x is the state variable and u is the control. We solve this problem by "working backwards" from
the final time, T. A
ti=Tthe problem is simply:
choose u
T
to max[(x
T1
T
)
2
2
T
],
obtained by substituting in the constraint. We can solve this problem (easily!) for arbitrary
values of x
T1
and obtain the optimal control u*(x
T1
, T). This function (the solution to the
optimization problem) is called the control rule. It is in "feedback form", i.e., it is expressed as
a function of the state. Substituting this control rule into the maximand, we obtain the value
function
J(x
T1
,T).
Both the control rule and value function have two arguments, the state
variable and calendar time.
Now we step back a period, to time T1. At that time we know the previous value of the state,
x
T2
, and we know how we will behave in the future, conditional on the state.
We can therefore
write the problem at T1 as
choose u
T1
to max [(x
T2
T1
)
2
2
T1
] + J(x
T2
T1
, T).
Note that this is a familiar static problem. At time T1 I have a 2 period (i.e. dynamic) problem,
but I have converted it to a static problem. Notice also that I substituted the constraint into the
period T1 payoff and the value function at T.
I can solve the problem at T1 and obtain the control rule, u*(x
T2
,T1). Again, note that this
control rule is conditional upon x
T2
. I need to know the value of x
T2
to find the value
of u
T1
,
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Documentbut I do not need the value of x
T2
to find the control rule
u*(x
T2
,T1). Note the distinction
between a function
(the control rule) and the value of the function
. Substitute the control rule
into the maximand to obtain the value function
at T1:
J(x
T2
,T1) = max [(x
T2
+u
T1
)
2
2
T1
] + J(x
T2
T1
, T) = [(x
T2
+u*
T1
)
2
2
T1
] + J(x
T2
+
u*
T1
, T).
I got rid of the max operator by substituting in the optimal control. The reason that I can
actually solve this problem, of course, is that I know the function J(x
T1
,T)
.
I can keep "stepping back" in time in this manner. At arbitrary time t, I can write the problem
as
J(x
t1
,t) = max { [x
2
t
2
t
] + J(x
t
, t+1)}, subject to x
t
=x
t1
t
,x
t1
given.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '06
 KARP
 Dynamic Programming, Optimization, Xt, control rule, max E Atyt

Click to edit the document details