EE 292E Analysis and Control of Markov Chains Spring 2008 Prof. Ben Van Roy May 29, 2008 Homework Assignment 8: Due June 2 Priority Assignment Consider a variation of the queueing priority assignment problem in the last homework. In par- ticular, there are n queues and 1 server. Each i th queue initially has x i customers waiting and no new customers enter the system during operation. The server can be assigned to one queue at any time. The service time for a customer in queue i is exponentially distributed with mean τ i . Keeping a customer waiting in queue i costs g i per unit time. When a customer in queue i is served he leaves the system with probability p i 0 or joins queue j with probability p ij (for each i , n j =0 p ij = 1). Future costs are discounted at a rate of β > 0. The objective is to minimize the expected discounted integral of future costs. a) Show how this problem can be reduced to a multi-armed bandit problem.
b) Based on this reduction, show that there is an optimal policy that amounts to ranking the queues and at each time serving the highest-ranked queue that has customers. c) Derive a simple formula for ranking the queues in a way that leads to an optimal policy. Coin Tossing Suppose we are faced with 10 possibly biased coins. Each coin's bias (probability of heads) is either 2 / 3 or 1 / 3. Our prior probabilities over the possible biases of each coin are independent and uniform. At each time, we choose one coin to flip and receive $1 if the coin lands heads (and nothing if it lands tails). Each coin can be flipped at most 100 times. Our objective is to maximize expected discounted revenue, with a discount factor of 0 . 99. Compute an optimal strategy that selects the next coin to flip given observations to date.
