AMS 556 Dynamic Programming - Homework 01 Solution
1) Equivalent Randomized Markov Policy
Starting from x0 = 1, we list all the paths that can happen and the probabilities they
can happen, in the Tabl
Lecture 2: Properties of Strategic Measures
In this lecture, we describe two fundamental properties of strategic measures. The rst property is that for any initial distribution , the set of
strategic
Lecture 5: Optimality Operators.
For x X, a A(x), and for a nonnegative or nonpositive function f on
X, we dene
p(y|x, a)f (y).
(1)
P a f (x) =
yX
Lemma 1 P a V+ (x) V+ (x) for any x X and for any a A
Lecture 3: Properties of Strategic Measures
Representation of strategic measures for randomized Markov
policies via strategic measures for (nonrandomized) Markov policies.
The set M of Markov policies
Lecture 6: Optimality Equation
Theorem 1 If the General Convergence Condition holds then
V = T V.
(1)
Proof First, we show that V (x) T V (x) for all x X. In view of Corollary
6 from Lecture 4, for an
Lecture 9: Value Iterations for Discounted Dynamic Programming
Consider a function V0 and consider the values T n V0 . It is natural to
study the conditions under which the values T n V0 are well-dene
Lecture 7: Stationary Optimal Policies
The following Lemma states that any stationary optimal policy is conserving.
Lemma 1 Let the General Convergence Condition hold. Then any stationary optimal poli
Lecture 8: The Principle of Contraction Mappings
Metric Spaces. A metric space is a pair (Y, d) of two things: a set Y
and a nonnegative function d on Y Y called a distance between x and y
from Y .
De
Lecture 10: Finite-Horizon Models
Nonstationary Models. An MDP is called nonstationary if the sets
of available actions, transition probabilities, and rewards depend on the
time parameter. In particul
Lecture 12: Optimal stopping problem
Consider a Markov chain with the state space X and with the matrix
of transition probabilities P . There is a function g(x). Each state has two
actions: to continu
Lecture 11: Value Iteration
Let V0 be a function on X such that there exist two positive numbers K1
and K2 such that V0 K1 V+ + K2 . We recall that T n V0 (x) = V (x, n, V0 )
In particular, T n V0 = V
Lecture 14: An example when Markov -optimal policies do not exist.
We consider an example of a positive dynamic programming problem
when the model satises when action sets are nite but for any K > 0 t
Lecture 15: -optimal policies I.
Though stationary optimal and -optimal policies may not exist for positive dynamic programming problems, the following two theorems indicate
that, in some sense, there
Lecture 13: Examples when optimal policies do not exist
We shall consider two examples of positive dynamic programming problems when the model satises the Compactness Conditions but there is no
statio
Lecture 16: -optimal policies II.
Theorem 3 in Lecture describes the existence of multiplicative -optimal
policies in positive dynamic programming. In addition, Example 1 implies
that -optimal policie
Lecture 17: Discounted MDPs with Finite State and Action Sets.
In the previous lectures, we considered value iteration algorithm for Discounted Markov Decision Processes. In this lecture, we consider
Lecture 4: Total Reward Criteria.
Finite-horizon expected total rewards are
[N 1
]
vN (x, , , f ) = E
n r(xn , an ) + N f (xN ) ,
x
(1)
n=0
whenever the expectation is well-dened. In this equation, x
Lecture 1: Model Denitions and Notations
Let N = cfw_0, 1, . . . and let Rn be an n-dimensional Euclidean space,
R = R1 . A Markov Decision Process (MDP) is dened through the following
objects:
a stat
AMS 556 Dynamic Programming Homework Set 1
1/2
1/2
a
1
b
2
Consider an MDP with two states 1 and 2 presented on the diagram.
There is one action at state 1 and two actions a and b at state 2.
Problem
Lecture 2: Properties of Strategic Measures
In this lecture, we describe two fundamental properties of strategic measures. The first property is that for any initial distribution , the set of
strategi
Lecture 4: Total Reward Criteria.
Finite-horizon expected total rewards are
[N 1
]
vN (x, , , f ) := Ex
n r(xn , an ) + N f (xN ) ,
(1)
n=0
whenever the expectation is well-defined. In this equation,
Lecture 6: Optimality Equation
Theorem 1 If the General Convergence Condition holds then
V = T V.
(1)
Proof First, we show that V (x) T V (x) for all x X. In view of Corollary
6 from Lecture 4, for an
Lecture 5: Optimality Operators.
For x X, a A(x), and for a nonnegative or nonpositive function
we define
f : X R,
P a f (x) =
p(y|x, a)f (y).
yX
Lemma 1 For each x X and for each a A(x), the inequal
Lecture 9: Value Iterations for Discounted Dynamic Programming
Consider a function V0 and consider the values T n V0 . It is natural to
study the conditions under which the values T n V0 are well-defi
Lecture 9: Value Iterations for Discounted Dynamic Programming
Consider a function V0 and consider the values T n V0 . It is natural to
study the conditions under which the values T n V0 are well-defi
Lecture 1: Model Definitions and Notations
Let N = cfw_0, 1, . . . and let Rn be an n-dimensional Euclidean space,
R = R1 . A Markov Decision Process (MDP) is defined through the following
objects:
a
AMS 556 Dynamic Programming Homework Set 2
Problem 1. Is it true that GCC v+(x, ) < for all x and for all randomized Markov policies ?
Problem 2. Is it true that GCC v+(x, ) < for all x and for all no
AMS556
Dynamic Programming
Homework assignment 3
Problem 1. Let X = cfw_(0, 0) cfw_(i, j) : i = 1, 2, . . . , j = i + 1, . . . , 0, 1, and A = cfw_c, s,
where c stands for continue and s stands for st
AMS 556 Dynamic Programming Homework Set 1
1/2
1/2
a
1
b
2
Consider an MDP with two states 1 and 2 presented on the diagram.
There is one action at state 1 and two actions a and b at state 2.
Problem
AMS 556 Dynamic Programming - Homework 03 Solutions
1) Example 5.8
On each edge, we indicate the reward (r), action (a), and transition probability (p) in
the following form
ra(p)
where r takes value