This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Probabilistic Dynamic Programming Recall from our study of deterministic dynamic programming that many recursions were of the following form: f t (current state) min all feasible (or max){costs during current stage f t 1 (new state)} decisions For all the examples in Chapter 18, a specification of the current state and current decision was enough to tell us with certainty the new state and the costs during the current stage. In many practical problems, these factors may not be known with certainty, even if the current state and decision are known. For example, in the inventory model of Section 18.3, we as sumed that each periods demand was known at the beginning of the problem. In most situa tions, it would be more realistic to assume that period t s demand is a random variable whose value is not known until after period t s production decision is made. Even if we know the cur rent periods state (beginning inventory level) and decision (production during the current pe riod), the next periods state and the current periods cost will be random variables whose val ues are not known until the value of period t s demand is known. The Chapter 18 discussion simply does not apply to this problem. In this chapter, we explain how to use dynamic programming to solve problems in which the current periods cost or the next periods state are random. We call these problems prob abilistic dynamic programming problems (or PDPs). In a PDP, the decision makers goal is usually to minimize expected (or expected discounted) cost incurred or to maximize expected (or expected discounted) reward earned over a given time horizon. Chapter 19 concludes with a brief study of Markov decision processes. A Markov decision process is just a probabilistic dynamic programming problem in which the decision maker faces an infinite horizon. 19.1 When Current Stage Costs Are Uncertain, but the Next Periods State Is Certain For problems in this section, the next periods state is known with certainty, but the re ward earned during the current stage is not known with certainty (given the current state and decision). E X A M P L E 1 For a price of $1/gallon, the Safeco Supermarket chain has purchased 6 gallons of milk from a local dairy. Each gallon of milk is sold in the chains three stores for $2/gallon. The dairy must buy back for 50/gallon any milk that is left at the end of the day. Un fortunately for Safeco, demand for each of the chains three stores is uncertain. Past data indicate that the daily demand at each store is as shown in Table 1. Safeco wants to allo Milk Distribution 1 9 . 1 When Current Stage Costs Are Uncertain, but the Next Periods State Is Certain 1017 cate the 6 gallons of milk to the three stores so as to maximize the expected net daily profit (revenues less costs) earned from milk. Use dynamic programming to determine how Safeco should allocate the 6 gallons of milk among the three stores....
View
Full
Document
This note was uploaded on 11/02/2010 for the course ORIE 1101 taught by Professor Trotter during the Fall '10 term at Cornell University (Engineering School).
 Fall '10
 TROTTER

Click to edit the document details