This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ly large n, π 0δ (n) ≥ β (u ) − ≤/2. Also, since
δmin (1) ≤ δi (n) ≤ δmax (1) for all i and n, and since [χ(m)] → 0, we see that [χ(m)]δ (n) ≥
−(≤/2)e for all large enough m. Thus, for all large enough n and m, δi (n + m) ≥ β (u ) − ≤.
Thus, for any ≤ > 0, there is an n0 such that for all n ≥ n0 ,
δi (n) ≥ β (u ) − ≤. (4.96) Also, from (4.95), we have π 0 [δ (n) − β (u )e ] ≤ 0, so
π 0 [δ (n) − β (u )e + ≤e ] ≤ ≤. (4.97) 0
From (4.96), each term, πi [δi (n) − β (u ) + ≤], on the left side of (4.97) is nonnegative, so
0
each must also be smaller than ≤. For πi > 0, it follows that
0
δi (n) − β (u ) + ≤ ≤ ≤/πi for all i and all n ≥ n0 . (4.98) 0
Since ≤ > 0 is arbitrary, (4.96) and (4.98) together with πi > 0 show that, limn→1 δi (n) =
β (u ), completing the proof of Lemma 4.5. Since k 0 is a unique optimal stationary policy, we have
X (k0 )
X (k )
(k0 )
(k )
0
0
ri i +
Pij i wj > ri i +
Pij i wj
j j 180 CHAPTER 4. FINITESTATE MARKOV CHAINS 0
for all i and all ki 6= ki . Snce this is a ﬁnite set of strict inequalities, there is an α > 0 such
0
that for all i > m, ki 6= ki ,
X (k0 )
X (k )
(k0 )
(k )
0
0
ri i +
Pij i wj ≥ ri i +
Pij i wj + α.
(4.99)
j j ∗
0
Since vi (n, w 0 ) = ng 0 + wi ,
0
(ki ) ∗
vi (n + 1, w 0 ) = ri + X
j (ki (n)) ≥ ri + (k0 ) ∗
Pij i vj (n, w 0 ) X (k (n)) ∗
vj (n, w 0 ) Pij i (4.100)
+ α. (4.101) j 0
for each i and ki (n) 6= ki . Subtracting (4.101) from (4.86),
X (k0 )
0
δi (n + 1) ≤
Pij i δj (n) − α for ki (n) 6= ki . (4.102) j Since δi (n) ≤ δmax (n), (4.102) can be further bounded by δi (n + 1) ≤ δmax (n) − α for
P (k0 )
0
0
ki (n) 6= ki . Combining this with δi (n + 1) = j Pij i δj (n) for ki (n) = ki ,
h
i
X (k0 )
δi (n + 1) ≤ max δmax − α,
Pij i δj (n) .
(4.103)
j Next, since k 0 is a unichain, we can renumber the transient states, m < i ≤ M so that
0
P
(ki )
> 0 for each i, m < i ≤ M. Since this is a ﬁnite set of strict inequalities, there
j <i Pij
is some ∞ > 0 such that
X (k0 )
Pij i ≥ ∞ for m < i ≤ M.
(4.104)
j <i The quantity δi (n) for each transient state i is somewhat diﬃcult to work with directly, so
˜
we deﬁne the new quantity, δi (n), which will be shown in the following lemma to upper
˜
bound δi (n). The deﬁnition for δi (n) is given iteratively for n ≥ 1, m < i ≤ M as
h
i
˜
˜
˜
˜
δi (n + 1) = max δM (n) − α, ∞ δi−1 (n) + (1 − ∞ )δM (n) .
(4.105)
The boundary conditions for this are deﬁned to be ˜
δi (1) = δmax (1); m < i ≤ M
˜
δm (n) = sup max δi (n0 ).
n0 ≥n i≤m (4.106)
(4.107) Lemma 4.6. Under the hypotheses of Theorem 4.13, with α deﬁned by (4.99) and ∞ deﬁned
by (4.104), the fol lowing three inequalities hold,
˜
˜
δi (n) ≤ δi (n − 1);
˜
˜
δi (n) ≤ δi+1 (n);
˜
δj (n) ≤ δi (n); for n ≥ 2, m ≤ i ≤ M (4.108) for n ≥ 1, j ≤ i, m ≤ i ≤ M. (4.110) for n ≥ 1, m ≤ i < M (4.109) 4.6. MARKOV DECISION THEORY AND DYNAMIC PROGRAMMING 181 Proof* of (4.108): Since the supremum in (4.107) is over a set decreasing in n,
˜
˜
δm (n) ≤ δm (n − 1); for n ≥ 1. (4.111) ˜
This establishes (4.108) for i = m. To establish (4.108) for n = 2, note that δi (1) = δmax (1)
for i > m and
˜
δm (1) = sup max δi (n0 ) ≤ sup δmax (n0 ) ≤ δmax (1).
n0 ≥1 i≤m Thus (4.112) n0 ≥1 h
˜
˜
δi (2) = max δM (1) − α, i
˜
˜
∞ δi−1 (1) + (1 − ∞ )δM (1) ˜
≤ δmax (1) = δi (1) for i > m. Finally, we use induction for n ≥ 2, i > m, using n = 2 as the basis. Assuming (4.108) for
a given n ≥ 2,
˜
˜
˜
˜
δi (n+1) = max[δM (n)−α, ∞ δi−1 (n) + (1−∞ )δM (n)]
˜
˜
˜
˜
≤ max[δM (n−1)−α, ∞ δi−1 (n−1) + (1−∞ )δM (n−1)] = δi (n). ˜
Proof* of (4.109): Using (4.112) and the fact that δi (1) = δmax (1) for i > m, (4.109) is
valid for n = 1. Using induction on n with n = 1 as the basis, we assume (4.109) for a given
n ≥ 1. Then for m ≤ i ≤ M,
˜
˜
˜
˜
δi (n + 1) ≤ δi (n) ≤ ∞ δi (n) + (1 − ∞ )δM (n)
˜
˜
˜
˜
≤ max[δM (n) − α, ∞ δi (n) + (1 − ∞ )δM (n)] = δi+1 (n + 1). ˜
Proof* of (4.110): Note that δj (n) ≤ δm (n) for all j ≤ m and n ≥ 1 by the deﬁnition
˜i (n) for j ≤ m ≤ i. Also, for all i > m and j ≤ i,
in (4.107). From (4.109), δj (n) ≤ δ
˜
δj (1) ≤ δmax (1) = δi (1). Thus (4.110) holds for n = 1. We complete the proof by using
induction on n for m < j ≤ i, using n = 1 as the basis. Assume (4.110) for a given
˜
˜
n ≥ 1. Then, δj (n) ≤ δM (n) for all j , and it then follows that δmax (n) ≤ δM (n). Similarly,
˜
δj (n) ≤ δi−1 (n) for j ≤ i − 1. For i > m, we then have
h
i
X k0
δi (n+1) ≤ max δmax (n)−α,
Piji δj (n)
h
˜
≤ max δM (n)−α,
h
˜
≤ max δM (n)−α, j X
j <i k0 ˜
Piji δi−1 (n) + X
j ≥i i
k0 ˜
Piji δM (n)...
View
Full
Document
 Spring '09
 R.Srikant

Click to edit the document details