This preview shows page 1. Sign up to view the full content.
Unformatted text preview: example, if k = 2 and (a1 , a2 ) = (0, 1), the corresponding chain is given
a) For the chain above, ﬁnd the mean ﬁrst passage time from state 0 to state 2.
b) For parts b to d, let (a1 , a2 , a3 , . . . , ak ) = (0, 1, 1, . . . , 1), i.e., zero followed by k − 1
ones. Draw the corresponding Markov chain for k = 4.
c) Let vi , 1 ≤ i ≤ k be the expected ﬁrst passage time from state i to state k. Note that
vk = 0. Show that v0 = 1/p0 + v1 .
d) For each i, 1 ≤ i < k, show that vi = αi + vi+1 and v0 = βi + vi+1 where αi and βi
are each a product of powers of p0 and p1 . Hint: use induction, or iteration, starting with
i = 1, and establish both equalities together.
e) Let k = 3 and let (a1 , a2 , a3 ) = (1, 0, 1). Draw the corresponding Markov chain for this
string. Evaluate v0 , the expected ﬁrst passage time for the string 1,0,1 to occur.
f ) Use renewal theory to explain why the answer in part e is diﬀerent from that in part d
with k = 3. Exercise 4.25. a) Find limn→1 [P ]n for the Markov chain below. Hint: Think in terms
of the long term transition probabilities. Recall that the edges in the graph for a Markov
chain correspond to the positive transition probabilities.
b) Let π (1) and π (2) denote the ﬁrst two rows of limn→1 [P ]n and let ∫ (1) and ∫ (2) denote the
ﬁrst two columns of limn→1 [P ]n . Show that π (1) and π (2) are independent left eigenvectors
of [P ], and that ∫ (1) and ∫ (2) are independent right eigenvectors of [P ]. Find the eigenvalue
for each eigenvector. 4.8. EXERCISES 1 191 P31
❈ P32 ♥ 1
② P33 c) Let r be an arbitrary reward vector and consider the equation
w + g (1)∫ (1) + g (2)∫ (2) = r + [P ]w . (4.116) Determine what values g (1) and g (2) must have in order for (4.84) to have a solution. Argue
that with the additional constraints w1 = w2 = 0, (4.84) has a unique solution for w and
ﬁnd that w .
d) Show that, with the w above, w 0 = w + α∫ (1) + β∫ (2) satisﬁes (4.84) for all choices of
scalars α and β .
e) Assume that the reward at stage 0 is u = w . Show that v (n, w ) = n(g (1)∫ (1) + g (2)∫ (2) ) +
f ) For an arbitrary reward u at stage 0, show that v (n, u ) = n(g (1)∫ (1) + g (2)∫ (2) ) + w +
[P ]n (u − w ). Note that this veriﬁes (4.49-4.51) for this special case.
Exercise 4.26. Generalize Exercise 4.25 to the general case of two recurrent classes and
an arbitrary set of transient states. In part (f ), you will have to assume that the recurrent
classes are ergodic. Hint: generalize the proof of Lemma 4.1 and Theorem 4.9
Exercise 4.27. Generalize Exercise 4.26 to an arbitrary number of recurrent classes and
an arbitrary number of transient states. This veriﬁes (4.49-4.51) in general.
Exercise 4.28. Let u and u 0 be arbitrary ﬁnal reward vectors with u ≤ u 0 . a) Let k be an arbitrary stationary policy and prove that v k (n, u ) ≤ v k (n, u 0 ) for each
n ≥ 1.
b) Prove that v ∗ (n, u ) ≤ v ∗ (n, u 0 ) for each n ≥ 1. This is known as the monotonicity
theorem. Exercise 4.29. George drives his car to the theater, which is at the end of a one-way street.
There are parking places along the side of the street and a parking garage that costs $5 at
the theater. Each parking place is independently occupied or unoccupied with probability
1/2. If George parks n parking places away from the theater, it costs him n cents (in time
and shoe leather) to walk the rest of the way. George is myopic and can only see the parking
place he is currently passing. If George has not already parked by the time he reaches the
nth place, he ﬁrst decides whether or not he will park if the place is unoccupied, and then
observes the place and acts according to his decision. George can never go back and must
park in the parking garage if he has not parked before.
a) Model the above problem as a 2 state Markov decision problem. In the “driving” state,
state 2, there are two possible decisions: park if the current place is unoccupied or drive on
whether or not the current place is unoccupied. 192 CHAPTER 4. FINITE-STATE MARKOV CHAINS ∗
b) Find vi (n, u ), the minimum expected aggregate cost for n stages (i.e., immediately
before observation of the nth parking place) starting in state i = 1 or 2; it is suﬃcient
to express vi (n, u ) in times of vi (n − 1). The ﬁnal costs, in cents, at stage 0 should be
v2 (0) = 500, v1 (0) = 0. c) For what values of n is the optimal decision the decision to drive on?
d) What is the probability that George will park in the garage, assuming that he follows
the optimal policy?
Exercise 4.30. Consider the dynamic programming problem below with two states and
two possible policies, denoted k and k 0 . The policies diﬀer only in state 2.
k 0 =6
a) Find the steady-state gai...
View Full Document
- Spring '09