Unformatted text preview: b) Based on this reduction, show that there is an optimal policy that amounts to ranking the queues and at each time serving the highestranked queue that has customers. c) Derive a simple formula for ranking the queues in a way that leads to an optimal policy. Coin Tossing Suppose we are faced with 10 possibly biased coins. Each coin’s bias (probability of heads) is either 2 / 3 or 1 / 3. Our prior probabilities over the possible biases of each coin are independent and uniform. At each time, we choose one coin to ﬂip and receive $1 if the coin lands heads (and nothing if it lands tails). Each coin can be ﬂipped at most 100 times. Our objective is to maximize expected discounted revenue, with a discount factor of 0 . 99. Compute an optimal strategy that selects the next coin to ﬂip given observations to date....
View
Full Document
 Spring '10
 Bertsekas
 Probability theory, $1, Multiarmed bandit, Peter Whittle, Prof. Ben Van Roy

Click to edit the document details