Discrete-time stochastic processes

In this case the inter renewal time tj j is not a rv

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: example, if k = 2 and (a1 , a2 ) = (0, 1), the corresponding chain is given by 0 1 ③ ❧ ③✘☎ ❧ ❧ 0 1 2 ✆ ② ② 0 a) For the chain above, find the mean first passage time from state 0 to state 2. b) For parts b to d, let (a1 , a2 , a3 , . . . , ak ) = (0, 1, 1, . . . , 1), i.e., zero followed by k − 1 ones. Draw the corresponding Markov chain for k = 4. c) Let vi , 1 ≤ i ≤ k be the expected first passage time from state i to state k. Note that vk = 0. Show that v0 = 1/p0 + v1 . d) For each i, 1 ≤ i < k, show that vi = αi + vi+1 and v0 = βi + vi+1 where αi and βi are each a product of powers of p0 and p1 . Hint: use induction, or iteration, starting with i = 1, and establish both equalities together. e) Let k = 3 and let (a1 , a2 , a3 ) = (1, 0, 1). Draw the corresponding Markov chain for this string. Evaluate v0 , the expected first passage time for the string 1,0,1 to occur. f ) Use renewal theory to explain why the answer in part e is different from that in part d with k = 3. Exercise 4.25. a) Find limn→1 [P ]n for the Markov chain below. Hint: Think in terms of the long term transition probabilities. Recall that the edges in the graph for a Markov chain correspond to the positive transition probabilities. b) Let π (1) and π (2) denote the first two rows of limn→1 [P ]n and let ∫ (1) and ∫ (2) denote the first two columns of limn→1 [P ]n . Show that π (1) and π (2) are independent left eigenvectors of [P ], and that ∫ (1) and ∫ (2) are independent right eigenvectors of [P ]. Find the eigenvalue for each eigenvector. 4.8. EXERCISES 1 191 P31 ✿♥ ✘1✛ ♥ 3 ❖ ❈ P32 ♥ 1 ✲2 ② P33 c) Let r be an arbitrary reward vector and consider the equation w + g (1)∫ (1) + g (2)∫ (2) = r + [P ]w . (4.116) Determine what values g (1) and g (2) must have in order for (4.84) to have a solution. Argue that with the additional constraints w1 = w2 = 0, (4.84) has a unique solution for w and find that w . d) Show that, with the w above, w 0 = w + α∫ (1) + β∫ (2) satisfies (4.84) for all choices of scalars α and β . e) Assume that the reward at stage 0 is u = w . Show that v (n, w ) = n(g (1)∫ (1) + g (2)∫ (2) ) + w. f ) For an arbitrary reward u at stage 0, show that v (n, u ) = n(g (1)∫ (1) + g (2)∫ (2) ) + w + [P ]n (u − w ). Note that this verifies (4.49-4.51) for this special case. Exercise 4.26. Generalize Exercise 4.25 to the general case of two recurrent classes and an arbitrary set of transient states. In part (f ), you will have to assume that the recurrent classes are ergodic. Hint: generalize the proof of Lemma 4.1 and Theorem 4.9 Exercise 4.27. Generalize Exercise 4.26 to an arbitrary number of recurrent classes and an arbitrary number of transient states. This verifies (4.49-4.51) in general. Exercise 4.28. Let u and u 0 be arbitrary final reward vectors with u ≤ u 0 . a) Let k be an arbitrary stationary policy and prove that v k (n, u ) ≤ v k (n, u 0 ) for each n ≥ 1. b) Prove that v ∗ (n, u ) ≤ v ∗ (n, u 0 ) for each n ≥ 1. This is known as the monotonicity theorem. Exercise 4.29. George drives his car to the theater, which is at the end of a one-way street. There are parking places along the side of the street and a parking garage that costs $5 at the theater. Each parking place is independently occupied or unoccupied with probability 1/2. If George parks n parking places away from the theater, it costs him n cents (in time and shoe leather) to walk the rest of the way. George is myopic and can only see the parking place he is currently passing. If George has not already parked by the time he reaches the nth place, he first decides whether or not he will park if the place is unoccupied, and then observes the place and acts according to his decision. George can never go back and must park in the parking garage if he has not parked before. a) Model the above problem as a 2 state Markov decision problem. In the “driving” state, state 2, there are two possible decisions: park if the current place is unoccupied or drive on whether or not the current place is unoccupied. 192 CHAPTER 4. FINITE-STATE MARKOV CHAINS ∗ b) Find vi (n, u ), the minimum expected aggregate cost for n stages (i.e., immediately before observation of the nth parking place) starting in state i = 1 or 2; it is sufficient ∗ ∗ to express vi (n, u ) in times of vi (n − 1). The final costs, in cents, at stage 0 should be v2 (0) = 500, v1 (0) = 0. c) For what values of n is the optimal decision the decision to drive on? d) What is the probability that George will park in the garage, assuming that he follows the optimal policy? Exercise 4.30. Consider the dynamic programming problem below with two states and two possible policies, denoted k and k 0 . The policies differ only in state 2. 1/2 7/8 1/2 3/4 1/2 1/2 ③ ♥ ③ ♥ ♥ ♥ 2 2 ✿② ✘1 ② ✿② ✘1 ② 1/8 1/4 k =5 k 0 =6 r1 =0 r2 r1 =0 r2 a) Find the steady-state gai...
View Full Document

Ask a homework question - tutors are online