Discrete-time stochastic processes

E let k 3 and let a1 a2 a3 1 0 1 draw the

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ly large n, π 0δ (n) ≥ β (u ) − ≤/2. Also, since δmin (1) ≤ δi (n) ≤ δmax (1) for all i and n, and since [χ(m)] → 0, we see that [χ(m)]δ (n) ≥ −(≤/2)e for all large enough m. Thus, for all large enough n and m, δi (n + m) ≥ β (u ) − ≤. Thus, for any ≤ > 0, there is an n0 such that for all n ≥ n0 , δi (n) ≥ β (u ) − ≤. (4.96) Also, from (4.95), we have π 0 [δ (n) − β (u )e ] ≤ 0, so π 0 [δ (n) − β (u )e + ≤e ] ≤ ≤. (4.97) 0 From (4.96), each term, πi [δi (n) − β (u ) + ≤], on the left side of (4.97) is non-negative, so 0 each must also be smaller than ≤. For πi > 0, it follows that 0 δi (n) − β (u ) + ≤ ≤ ≤/πi for all i and all n ≥ n0 . (4.98) 0 Since ≤ > 0 is arbitrary, (4.96) and (4.98) together with πi > 0 show that, limn→1 δi (n) = β (u ), completing the proof of Lemma 4.5. Since k 0 is a unique optimal stationary policy, we have X (k0 ) X (k ) (k0 ) (k ) 0 0 ri i + Pij i wj > ri i + Pij i wj j j 180 CHAPTER 4. FINITE-STATE MARKOV CHAINS 0 for all i and all ki 6= ki . Snce this is a finite set of strict inequalities, there is an α > 0 such 0 that for all i > m, ki 6= ki , X (k0 ) X (k ) (k0 ) (k ) 0 0 ri i + Pij i wj ≥ ri i + Pij i wj + α. (4.99) j j ∗ 0 Since vi (n, w 0 ) = ng 0 + wi , 0 (ki ) ∗ vi (n + 1, w 0 ) = ri + X j (ki (n)) ≥ ri + (k0 ) ∗ Pij i vj (n, w 0 ) X (k (n)) ∗ vj (n, w 0 ) Pij i (4.100) + α. (4.101) j 0 for each i and ki (n) 6= ki . Subtracting (4.101) from (4.86), X (k0 ) 0 δi (n + 1) ≤ Pij i δj (n) − α for ki (n) 6= ki . (4.102) j Since δi (n) ≤ δmax (n), (4.102) can be further bounded by δi (n + 1) ≤ δmax (n) − α for P (k0 ) 0 0 ki (n) 6= ki . Combining this with δi (n + 1) = j Pij i δj (n) for ki (n) = ki , h i X (k0 ) δi (n + 1) ≤ max δmax − α, Pij i δj (n) . (4.103) j Next, since k 0 is a unichain, we can renumber the transient states, m < i ≤ M so that 0 P (ki ) > 0 for each i, m < i ≤ M. Since this is a finite set of strict inequalities, there j <i Pij is some ∞ > 0 such that X (k0 ) Pij i ≥ ∞ for m < i ≤ M. (4.104) j <i The quantity δi (n) for each transient state i is somewhat difficult to work with directly, so ˜ we define the new quantity, δi (n), which will be shown in the following lemma to upper ˜ bound δi (n). The definition for δi (n) is given iteratively for n ≥ 1, m < i ≤ M as h i ˜ ˜ ˜ ˜ δi (n + 1) = max δM (n) − α, ∞ δi−1 (n) + (1 − ∞ )δM (n) . (4.105) The boundary conditions for this are defined to be ˜ δi (1) = δmax (1); m < i ≤ M ˜ δm (n) = sup max δi (n0 ). n0 ≥n i≤m (4.106) (4.107) Lemma 4.6. Under the hypotheses of Theorem 4.13, with α defined by (4.99) and ∞ defined by (4.104), the fol lowing three inequalities hold, ˜ ˜ δi (n) ≤ δi (n − 1); ˜ ˜ δi (n) ≤ δi+1 (n); ˜ δj (n) ≤ δi (n); for n ≥ 2, m ≤ i ≤ M (4.108) for n ≥ 1, j ≤ i, m ≤ i ≤ M. (4.110) for n ≥ 1, m ≤ i < M (4.109) 4.6. MARKOV DECISION THEORY AND DYNAMIC PROGRAMMING 181 Proof* of (4.108): Since the supremum in (4.107) is over a set decreasing in n, ˜ ˜ δm (n) ≤ δm (n − 1); for n ≥ 1. (4.111) ˜ This establishes (4.108) for i = m. To establish (4.108) for n = 2, note that δi (1) = δmax (1) for i > m and ˜ δm (1) = sup max δi (n0 ) ≤ sup δmax (n0 ) ≤ δmax (1). n0 ≥1 i≤m Thus (4.112) n0 ≥1 h ˜ ˜ δi (2) = max δM (1) − α, i ˜ ˜ ∞ δi−1 (1) + (1 − ∞ )δM (1) ˜ ≤ δmax (1) = δi (1) for i > m. Finally, we use induction for n ≥ 2, i > m, using n = 2 as the basis. Assuming (4.108) for a given n ≥ 2, ˜ ˜ ˜ ˜ δi (n+1) = max[δM (n)−α, ∞ δi−1 (n) + (1−∞ )δM (n)] ˜ ˜ ˜ ˜ ≤ max[δM (n−1)−α, ∞ δi−1 (n−1) + (1−∞ )δM (n−1)] = δi (n). ˜ Proof* of (4.109): Using (4.112) and the fact that δi (1) = δmax (1) for i > m, (4.109) is valid for n = 1. Using induction on n with n = 1 as the basis, we assume (4.109) for a given n ≥ 1. Then for m ≤ i ≤ M, ˜ ˜ ˜ ˜ δi (n + 1) ≤ δi (n) ≤ ∞ δi (n) + (1 − ∞ )δM (n) ˜ ˜ ˜ ˜ ≤ max[δM (n) − α, ∞ δi (n) + (1 − ∞ )δM (n)] = δi+1 (n + 1). ˜ Proof* of (4.110): Note that δj (n) ≤ δm (n) for all j ≤ m and n ≥ 1 by the definition ˜i (n) for j ≤ m ≤ i. Also, for all i > m and j ≤ i, in (4.107). From (4.109), δj (n) ≤ δ ˜ δj (1) ≤ δmax (1) = δi (1). Thus (4.110) holds for n = 1. We complete the proof by using induction on n for m < j ≤ i, using n = 1 as the basis. Assume (4.110) for a given ˜ ˜ n ≥ 1. Then, δj (n) ≤ δM (n) for all j , and it then follows that δmax (n) ≤ δM (n). Similarly, ˜ δj (n) ≤ δi−1 (n) for j ≤ i − 1. For i > m, we then have h i X k0 δi (n+1) ≤ max δmax (n)−α, Piji δj (n) h ˜ ≤ max δM (n)−α, h ˜ ≤ max δM (n)−α, j X j <i k0 ˜ Piji δi−1 (n) + X j ≥i i k0 ˜ Piji δM (n)...
View Full Document

Ask a homework question - tutors are online