This preview shows page 1. Sign up to view the full content.
Unformatted text preview: §
˜
˜
˜
∞ δi−1 (n) + (1−∞ )δM (n) = δi (n+1), where the ﬁnal inequality follows from the deﬁnition of ∞ . Finally, using (4.109) again, we
˜
˜
have δj (n + 1) ≤ δj (n + 1) ≤ δi (n + 1) for m < j ≤ i, completing the proof of Lemma 4.6. ˜
Proof* of Theorem 4.13: From (4.110), δi (n) is nonincreasing in n for i ≥ m. Also,
˜i (n) ≥ δm (n) ≥ β (u ). Thus, limn→1 δi (n) exists for each i ≥ m.
˜
˜
from (4.109) and (4.97), δ
We then have
h
i
˜
˜
˜
˜
lim δM (n) = max lim δM (n)−α, ∞ lim δM−1 (n) + (1−∞ ) lim δM (n) .
n→1 n→1 n→1 n→1 182 CHAPTER 4. FINITESTATE MARKOV CHAINS Since α > 0, the second term in the maximum above must achieve the maximum in the
limit. Thus,
˜
˜
lim δM (n) = lim δM−1 (n). n→1 (4.113) n→1 In the same way,
˜
lim δM−1 (n) = max n→1 h ˜
lim δM (n)−α, n→1 i
˜
˜
∞ lim δM−2 (n) + (1−∞ ) lim δM−1 (n) .
n→1 n→1 Again, the second term must achieve the maximum, and using (4.113),
˜
˜
lim δM−1 (n) = lim δM−2 (n). n→1 n→1 Repeating this argument,
˜
˜
lim δi (n) = lim δi−1 (n) for each i, m < i ≤ M. n→1 n→1 (4.114) ˜
Now, from (4.94), limn→1 δi = β (u ) for i ≤ m. From (4.107), then, we see that limn→1 δm (n) =
β (u ). Combining this with (4.114),
˜
lim δi (n) = β (u ) for each i such that m ≤ i ≤ M. n→1 (4.115) Combining this with (4.110), we see that for any ≤ > 0, and any i, δi (n) ≤ β (u ) + ≤ for
large enough n. Combining this with (4.96) completes the proof. 4.7 Summary This chapter has developed the basic results about ﬁnitestate Markov chains from a primarily algebraic standpoint. It was shown that the states of any ﬁnitestate chain can be
partitioned into classes, where each class is either transient or recurrent, and each class is
periodic or aperiodic. If the entire chain is one recurrent class, then the Frobenius theorem, with all its corollaries, shows that ∏ = 1 is an eigenvalue of largest magnitude and
has positive right and left eigenvectors, unique within a scale factor. The left eigenvector
(scaled to be a probability vector) is the steadystate probability vector. If the chain is also
aperiodic, then the eigenvalue ∏ = 1 is the only eigenvalue of magnitude 1, and all rows
of [P ]n converge geometrically in n to the steadystate vector. This same analysis can be
applied to each aperiodic recurrent class of a general Markov chain, given that the chain
ever enters that class.
For a periodic recurrent chain of period d, there are d − 1 other eigenvalues of magnitude
1, with all d eigenvalues uniformly placed around the unit circle in the complex plane.
Exercise 4.17 shows how to interpret these eigenvectors, and shows that [P ]nd converges
geometrically as n → 1.
For an arbitrary ﬁnitestate Markov chain, if the initial state is transient, then the Markov
chain will eventually enter a recurrent state, and the probability that this takes more than 4.8. EXERCISES 183 n steps approaches zero geometrically in n; Exercise 4.14 shows how to ﬁnd the probability
that each recurrent class is entered. Given an entry into a particular recurrent class, then
the results above can be used to analyze the behavior within that class.
The results about Markov chains were extended to Markov chains with rewards. As with
renewal processes, the use of reward functions provides a systematic way to approach a
large class of problems ranging from ﬁrst passage times to dynamic programming. The key
result here is Theorem 4.9, which provides both an exact expression and an asymptotic
expression for the expected aggregate reward over n stages.
Finally, the results on Markov chains with rewards were used to understand Markov decision
theory. We developed the Bellman dynamic programming algorithm, and also investigated
the optimal stationary policy. Theorem 4.13 demonstrated the relationship between the
optimal dynamic policy and the optimal stationary policy. This section provided only an
introduction to dynamic programming and omitted all discussion of discounting (in which
future gain is considered worth less than present gain because of interest rates). We also
omitted inﬁnite state spaces.
For an introduction to vectors, matrices, and linear algebra, see any introductory text on
linear algebra such as Strang [20]. Gantmacher [11] has a particularly complete treatment of
nonnegative matrices and PerronFrobenius theory. For further reading on Markov decision
theory and dynamic programming, see Bertsekas, [3]. Bellman [1] is of historic interest and
quite readable. 4.8 Exercises Exercise 4.1. a) Prove that, for a ﬁnitestate Markov chain, if Pii > 0 for some i in a
recurrent class A, then class A is aperiodic.
b) Show that every ﬁnitestate Markov chain contains at least one recurrent set of states.
Hint: Construct a directed graph in which the states are nodes and an edge goes from i
to j if i → j but i is not accessible from j . Show that this graph contain...
View
Full
Document
 Spring '09
 R.Srikant

Click to edit the document details