This preview shows page 1. Sign up to view the full content.
Unformatted text preview: atural ways to deﬁne the total discounted rewards. One way is to interpret
discounting as the coefﬁcient in front of the reward rate. In this case, the total discounted
rewards are deﬁned as
∞ J1 =
0 e−αt rt dt + ∞ e−αTn Rn , n=1 where α > 0 is the discount rate.
Another way is to deﬁne the total discounted rewards as the total rewards until a stopping
time T that has an exponential distribution with rate α. Let T be independent of F∞ and let
P{T > t } = e−αt . Then the total discounted reward can be deﬁned as
T J2 =
0 N(T ) rt d t + Rn ,
n=1 where
N (t) = sup{n : Tn ≤ t }, t ≥ 0. It is well known that
E[J1 ] = E[J2 ], (2.1) if at least one side of this equation is well deﬁned (a random variable has a welldeﬁned
expectation if either the expectation of its positive part is ﬁnite or the expectation of its negative
part is ﬁnite).
Indeed,
N(T ) ∞ Rn = E
n=1 E Rn 1{T ≥ Tn }
n=1
∞ = E E[Rn 1{T ≥ Tn }  FTn ]
n=1
∞ =E Rn E[1{T ≥ Tn }  FTn ]
n=1
∞ =E Rn P{T ≥ Tn  FTn }
n=1
∞ =E
n=1 Rn e−αT...
View
Full
Document
 Fall '08
 Feinberg,E

Click to edit the document details