6.7
Stochastic dynamic programming
Dynamic programming techniques can also be applied to settings in which
there is a probabilistic element to the problem specification. As a first “toy”
example, consider the following betting game. Suppose that you have 4 chips,
and you wish to end up with 8 chips in order to win. In each round, you may
make a decision to bet any of the chips that you have, and either lose them,
or win back double your bet. That is, if you start with
s
chips, and bet
k
,
you end up with either
s

k
, or
s

k
+ (2
k
) =
s
+
k
chips. So, one simple
strategy is to bet all of your chips at once: you bet all 4 chips, and either win
(with probability .6) or lose (with probability .4). The outcomes of different
rounds of bets are independent events. This means that the probability of
certain events happening in different rounds is equal to the product of the
probabilities of them happening round by round.
The goal is to design a
strategy that maximizes your chance of winning. (The simple one gets us .6,
but is that the best possible?)
To see how one might do better, consider the approach indicated in the
following figure:
s
= 4
s
= 2
s
= 6
s
= 8
s
= 0
round 1
round 2
round 3
bet 2
bet 2
bet 2
bet 4
We next compute the probability of winning with this strategy. The prob
ability that we will win after two rounds of betting is (.6)(.6)=.36, whereas
64
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
the probability of being broke after two rounds is (.4)(.4)=.16. This means
that the probability of having 4 chips after 2 rounds of betting is equal to
1

.
16

.
36 =
.
48. Thus, the probability of winning by having 4 after two
rounds, and then winning in the 3rd is equal to (.48)(.6)=.288. So, in total,
the probability of winning with this strategy is equal to .648 (better than the
.6 of the simple one).
Furthermore, it is clear that we can improve the probability by playing
longer and longer games.
Observe that when we got back to 4 chips after
round 2, then we played the simple strategy in round 3. But we could further
improve things, by instead playing the entire 3round strategy at this point.
(The combined strategy would then take 5 rounds.)
And then we could
further improve things by using this 5round strategy to make a 7round
one, and so forth.
But now, suppose we ask the question, what is the best 3round strategy?
Can we determine that? This is exactly suited for dynamic programming.
We can follow the same 8step program to construct a dynamic programming
formulation here too. Let
N
denote the number of rounds in which we will
bet.
What are the stages of decisionmaking? Each stage corresponds to a round
of betting, and we decide in that round how many chips to bet.
What are the states for each stage? We can make the simplifying assumption
that the best strategy will never allow us to go above 8 chips, or below 0
as a possible outcome (since any strategy that does this can be converted
to another strategy, with at least as large a success probability, in which we
always have between 0 and 8 chips). After understanding this, then it is clear
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '07
 SHMOYS/LEWIS

Click to edit the document details