Unformatted text preview: b) Based on this reduction, show that there is an optimal policy that amounts to ranking the queues and at each time serving the highest-ranked queue that has customers. c) Derive a simple formula for ranking the queues in a way that leads to an optimal policy. Coin Tossing Suppose we are faced with 10 possibly biased coins. Each coin’s bias (probability of heads) is either 2 / 3 or 1 / 3. Our prior probabilities over the possible biases of each coin are independent and uniform. At each time, we choose one coin to ﬂip and receive $1 if the coin lands heads (and nothing if it lands tails). Each coin can be ﬂipped at most 100 times. Our objective is to maximize expected discounted revenue, with a discount factor of 0 . 99. Compute an optimal strategy that selects the next coin to ﬂip given observations to date....
View Full Document
- Spring '10
- Probability theory, $1, Multi-armed bandit, Peter Whittle, Prof. Ben Van Roy