# 5 - L Karp Notes for Dynamics V Dynamic Programming 1 2 3 4...

This preview shows pages 1–3. Sign up to view the full content.

L. Karp Notes for Dynamics V. Dynamic Programming 1) The basic idea of dynamic programming (discrete time). 2) The linear quadratic (LQ) discrete time control problem with additive errors. 3) Two problems related to the LQ problem. 4) Derivation of continuous time dynamic programming equation (DPE). 5) Relation between DP and maximum principle. 6) DPE for autonomous problem. 7) Writing optimal control rule as an ODE in the state. 8) LQ continuous time problem. 1. The basic idea of using dynamic programming is to convert a dynamic problem into a succession of static problems. We can illustrate this using the following problem: Choose a sequence of u t , i.e. {u} T t=1 , to max - Σ t i=1 (x 2 i +u 2 i ) subject to x i =x i-1 i ,x 0 given. x is the state variable and u is the control. We solve this problem by "working backwards" from the final time, T. A ti=Tthe problem is simply: choose u T to max-[(x T-1 T ) 2 2 T ], obtained by substituting in the constraint. We can solve this problem (easily!) for arbitrary values of x T-1 and obtain the optimal control u*(x T-1 , T). This function (the solution to the optimization problem) is called the control rule. It is in "feedback form", i.e., it is expressed as a function of the state. Substituting this control rule into the maximand, we obtain the value function J(x T-1 ,T). Both the control rule and value function have two arguments, the state variable and calendar time. Now we step back a period, to time T-1. At that time we know the previous value of the state, x T-2 , and we know how we will behave in the future, conditional on the state. We can therefore write the problem at T-1 as choose u T-1 to max -[(x T-2 T-1 ) 2 2 T-1 ] + J(x T-2 T-1 , T). Note that this is a familiar static problem. At time T-1 I have a 2 period (i.e. dynamic) problem, but I have converted it to a static problem. Notice also that I substituted the constraint into the period T-1 payoff and the value function at T. I can solve the problem at T-1 and obtain the control rule, u*(x T-2 ,T-1). Again, note that this control rule is conditional upon x T-2 . I need to know the value of x T-2 to find the value of u T-1 ,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
but I do not need the value of x T-2 to find the control rule u*(x T-2 ,T-1). Note the distinction between a function (the control rule) and the value of the function . Substitute the control rule into the maximand to obtain the value function at T-1: J(x T-2 ,T-1) = max -[(x T-2 +u T-1 ) 2 2 T-1 ] + J(x T-2 T-1 , T) = -[(x T-2 +u* T-1 ) 2 2 T-1 ] + J(x T-2 + u* T-1 , T). I got rid of the max operator by substituting in the optimal control. The reason that I can actually solve this problem, of course, is that I know the function J(x T-1 ,T) . I can keep "stepping back" in time in this manner. At arbitrary time t, I can write the problem as J(x t-1 ,t) = max { -[x 2 t 2 t ] + J(x t , t+1)}, subject to x t =x t-1 t ,x t-1 given.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 13

5 - L Karp Notes for Dynamics V Dynamic Programming 1 2 3 4...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online