This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 6.7 Stochastic dynamic programming Dynamic programming techniques can also be applied to settings in which there is a probabilistic element to the problem specification. As a first “toy” example, consider the following betting game. Suppose that you have 4 chips, and you wish to end up with 8 chips in order to win. In each round, you may make a decision to bet any of the chips that you have, and either lose them, or win back double your bet. That is, if you start with s chips, and bet k , you end up with either s- k , or s- k + (2 k ) = s + k chips. So, one simple strategy is to bet all of your chips at once: you bet all 4 chips, and either win (with probability .6) or lose (with probability .4). The outcomes of different rounds of bets are independent events. This means that the probability of certain events happening in different rounds is equal to the product of the probabilities of them happening round by round. The goal is to design a strategy that maximizes your chance of winning. (The simple one gets us .6, but is that the best possible?) To see how one might do better, consider the approach indicated in the following figure: s = 4 s = 2 s = 6 s = 8 s = 0 round 1 round 2 round 3 bet 2 bet 2 bet 2 bet 4 We next compute the probability of winning with this strategy. The prob- ability that we will win after two rounds of betting is (.6)(.6)=.36, whereas 64 the probability of being broke after two rounds is (.4)(.4)=.16. This means that the probability of having 4 chips after 2 rounds of betting is equal to 1- . 16- . 36 = . 48. Thus, the probability of winning by having 4 after two rounds, and then winning in the 3rd is equal to (.48)(.6)=.288. So, in total, the probability of winning with this strategy is equal to .648 (better than the .6 of the simple one). Furthermore, it is clear that we can improve the probability by playing longer and longer games. Observe that when we got back to 4 chips after round 2, then we played the simple strategy in round 3. But we could further improve things, by instead playing the entire 3-round strategy at this point. (The combined strategy would then take 5 rounds.) And then we could further improve things by using this 5-round strategy to make a 7-round one, and so forth. But now, suppose we ask the question, what is the best 3-round strategy? Can we determine that? This is exactly suited for dynamic programming. We can follow the same 8-step program to construct a dynamic programming formulation here too. Let N denote the number of rounds in which we will bet. What are the stages of decision-making? Each stage corresponds to a round of betting, and we decide in that round how many chips to bet. What are the states for each stage? We can make the simplifying assumption that the best strategy will never allow us to go above 8 chips, or below 0 as a possible outcome (since any strategy that does this can be converted to another strategy, with at least as large a success probability, in which we always have between 0 and 8 chips). After understanding this, then it is clearalways have between 0 and 8 chips)....
View Full Document
This note was uploaded on 04/06/2008 for the course ORIE 321 taught by Professor Shmoys/lewis during the Spring '07 term at Cornell.
- Spring '07