Substituting into 4 gives π w 1 p π w 1 p 1 r π w

• Notes
• 7
• 100% (5) 5 out of 5 people found this document helpful

This preview shows page 4 - 7 out of 7 pages.

Substituting into (4) gives: π W + (1 - p ) π W + (1 - p )(1 - r ) π W + . . . (1 - p )(1 - r ) L - 1 π W = 1 Thus: π W (1 + (1 - p ) + (1 - p )(1 - r ) + . . . (1 - p )(1 - r ) L - 1 ) = 1 and we conclude that: π W (1 + (1 - p )(1 + (1 - r ) + . . . (1 - r ) L - 1 )) = 1 Using the identity, N i =0 a i = 1 - a N +1 1 - a , we have: π W (1 + (1 - p )( 1 - (1 - r ) L r )) = 1 Define: C = 1 + (1 - p )( 1 - (1 - r ) L r ) so that: π W = 1 C Thus, from (6), we can describe π B i as: π B i = (1 - p )(1 - r ) i - 1 C (c) According to the policy described in this problem, we have the following cost function. C ( i ) = 0 if i = W c 1 if i = B j , j 6 = L c 2 if i = B L (d) We’ll use the result from the law of large numbers for Markov chains with reward functions. Recall from lecture that N - 1 h =0 f ( X h ) N X i π i f ( i ) Using our cost function from above, we have N - 1 h =0 C ( X h ) N X i π i C ( i ) 4
(e) Let’s define f ( c 1 , c 2 , L ) to be the average cost function that we derived above.
5
Combining these, f ( c 1 , c 2 , L ) = (1 - p ) r r + (1 - p )(1 - (1 - r ) L ) c 1 - c 1 (1 - r ) L - 1 + rc 2 (1 - r ) L - 1 r = (1 - p ) c 1 + (1 - p )( rc 2 - c 1 )(1 - r ) L - 1 ( r + 1 - p ) - (1 - p )(1 - r )(1 - r ) L - 1 Now, since p, r, c 1 and c 2 are all constants, we can simplify this expression by making the following substitutions: a = (1 - p ) c 1 b = (1 - p )( rc 2 - c 1 ) c = ( r + 1 - p ) d = - (1 - p )(1 - r ) x = (1 - r ) L - 1 Now, we have f ( x ) = a + bx c + dx Using the quotient rule to differentiate with respect to x, we have f 0 ( x ) = bc - ad ( c - dx ) 2 A property of an expression of the form of f ( x ) is that it’s derivative will always have the same sign. Note that an increase in L will cause a decrease in x since x = (1 - r ) L - 1 and (1 - r ) < 1 . Thus, we must find conditions that would make the derivative negative in x (which would imply that f ( c 1 , c 2 , L ) is increasing in L so the optimal policy would be to
• • • 