Discrete-time stochastic processes

A show that if r k p k w 0 r k p k w 0 is not satised

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: § ˜ ˜ ˜ ∞ δi−1 (n) + (1−∞ )δM (n) = δi (n+1), where the final inequality follows from the definition of ∞ . Finally, using (4.109) again, we ˜ ˜ have δj (n + 1) ≤ δj (n + 1) ≤ δi (n + 1) for m < j ≤ i, completing the proof of Lemma 4.6. ˜ Proof* of Theorem 4.13: From (4.110), δi (n) is non-increasing in n for i ≥ m. Also, ˜i (n) ≥ δm (n) ≥ β (u ). Thus, limn→1 δi (n) exists for each i ≥ m. ˜ ˜ from (4.109) and (4.97), δ We then have h i ˜ ˜ ˜ ˜ lim δM (n) = max lim δM (n)−α, ∞ lim δM−1 (n) + (1−∞ ) lim δM (n) . n→1 n→1 n→1 n→1 182 CHAPTER 4. FINITE-STATE MARKOV CHAINS Since α > 0, the second term in the maximum above must achieve the maximum in the limit. Thus, ˜ ˜ lim δM (n) = lim δM−1 (n). n→1 (4.113) n→1 In the same way, ˜ lim δM−1 (n) = max n→1 h ˜ lim δM (n)−α, n→1 i ˜ ˜ ∞ lim δM−2 (n) + (1−∞ ) lim δM−1 (n) . n→1 n→1 Again, the second term must achieve the maximum, and using (4.113), ˜ ˜ lim δM−1 (n) = lim δM−2 (n). n→1 n→1 Repeating this argument, ˜ ˜ lim δi (n) = lim δi−1 (n) for each i, m < i ≤ M. n→1 n→1 (4.114) ˜ Now, from (4.94), limn→1 δi = β (u ) for i ≤ m. From (4.107), then, we see that limn→1 δm (n) = β (u ). Combining this with (4.114), ˜ lim δi (n) = β (u ) for each i such that m ≤ i ≤ M. n→1 (4.115) Combining this with (4.110), we see that for any ≤ > 0, and any i, δi (n) ≤ β (u ) + ≤ for large enough n. Combining this with (4.96) completes the proof. 4.7 Summary This chapter has developed the basic results about finite-state Markov chains from a primarily algebraic standpoint. It was shown that the states of any finite-state chain can be partitioned into classes, where each class is either transient or recurrent, and each class is periodic or aperiodic. If the entire chain is one recurrent class, then the Frobenius theorem, with all its corollaries, shows that ∏ = 1 is an eigenvalue of largest magnitude and has positive right and left eigenvectors, unique within a scale factor. The left eigenvector (scaled to be a probability vector) is the steady-state probability vector. If the chain is also aperiodic, then the eigenvalue ∏ = 1 is the only eigenvalue of magnitude 1, and all rows of [P ]n converge geometrically in n to the steady-state vector. This same analysis can be applied to each aperiodic recurrent class of a general Markov chain, given that the chain ever enters that class. For a periodic recurrent chain of period d, there are d − 1 other eigenvalues of magnitude 1, with all d eigenvalues uniformly placed around the unit circle in the complex plane. Exercise 4.17 shows how to interpret these eigenvectors, and shows that [P ]nd converges geometrically as n → 1. For an arbitrary finite-state Markov chain, if the initial state is transient, then the Markov chain will eventually enter a recurrent state, and the probability that this takes more than 4.8. EXERCISES 183 n steps approaches zero geometrically in n; Exercise 4.14 shows how to find the probability that each recurrent class is entered. Given an entry into a particular recurrent class, then the results above can be used to analyze the behavior within that class. The results about Markov chains were extended to Markov chains with rewards. As with renewal processes, the use of reward functions provides a systematic way to approach a large class of problems ranging from first passage times to dynamic programming. The key result here is Theorem 4.9, which provides both an exact expression and an asymptotic expression for the expected aggregate reward over n stages. Finally, the results on Markov chains with rewards were used to understand Markov decision theory. We developed the Bellman dynamic programming algorithm, and also investigated the optimal stationary policy. Theorem 4.13 demonstrated the relationship between the optimal dynamic policy and the optimal stationary policy. This section provided only an introduction to dynamic programming and omitted all discussion of discounting (in which future gain is considered worth less than present gain because of interest rates). We also omitted infinite state spaces. For an introduction to vectors, matrices, and linear algebra, see any introductory text on linear algebra such as Strang [20]. Gantmacher [11] has a particularly complete treatment of non-negative matrices and Perron-Frobenius theory. For further reading on Markov decision theory and dynamic programming, see Bertsekas, [3]. Bellman [1] is of historic interest and quite readable. 4.8 Exercises Exercise 4.1. a) Prove that, for a finite-state Markov chain, if Pii > 0 for some i in a recurrent class A, then class A is aperiodic. b) Show that every finite-state Markov chain contains at least one recurrent set of states. Hint: Construct a directed graph in which the states are nodes and an edge goes from i to j if i → j but i is not accessible from j . Show that this graph contain...
View Full Document

Ask a homework question - tutors are online