Unformatted text preview: ess receives
a lump sum reward of 1. Let the time interval between jumps be 1 unit of time. The discount
factor is α and T ∼ exp(α).
The total discounted rewards under the two deﬁnitions are respectively
∞ J1 =
n=1 e−αn = e−α
1 − e−α N(T ) J2 = 1 = N (T ).
n=1 Note that J1 is a deterministic number and J2 is a random variable depending on T . Thus,
var (J1 ) = 0 < var (J2 ). In fact, direct calculation shows that
var (J2 ) = e−α
(1 − e−α )2 . Acknowledgement
This research was supported by NSF grants CMMI-0600538 and CMMI-0928490.
 Baykal-Gürsoy, M. and Gürsoy, K. (2007). Semi-Markov decision processes: nonstandard criteria. Prob.
Eng. Inf. Sci. 21, 635–657.
 Feinberg, E. A. (2004). Continuous time discounted jump Markov decision processes: a discrete-event approach.
Math. Operat. Res. 29, 492–524.
 Fristedt, B. and Gray, L. (1997). A Modern Approach to Probability Theory. Birkhäuser, Boston, MA.
 Jaquette, S. C. (1975). Markov decision processes with a new optimality criterion: continuous time. Ann.
Statist. 3, 547–553.
 Markowitz, H. M. (1952). Portfolio selection. J. Finance 7, 77–91.
 Shiryaev, A. N. (1996). Probability, 2nd edn. Springer, New York.
 Sobel, M. J. (1982). The variance of discounted Markov decision processes. J. Appl. Prob. 19, 794–802.
 Sobel, M. J. (1985). Maximal mean/standard deviation ratio in an undiscounted MDP. Operat. Res. Lett. 4,
 Sobel, M. J. (1994). Mean-variance tradeoffs in an undiscounted MDP. Operat. Res. 42, 175–183.
 Van Dijk, N. M. and Sladký, K. (2006). On the total reward variance for continuous-time Markov reward
chains. J. Appl. Prob. 43, 1044–1052.
 White, D. J. (1988). Mean, variance, and probabilistic criteria in ﬁnite Markov decision processes: a review.
J. Optimization Theory Appl. 56, 1–29....
View Full Document
- Fall '08
- Variance, Probability theory, Tn, J2, total discounted rewards