euclid.jap.1261670699

In this case the total discounted rewards are dened

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: atural ways to deﬁne the total discounted rewards. One way is to interpret discounting as the coefﬁcient in front of the reward rate. In this case, the total discounted rewards are deﬁned as ∞ J1 = 0 e−αt rt dt + ∞ e−αTn Rn , n=1 where α &gt; 0 is the discount rate. Another way is to deﬁne the total discounted rewards as the total rewards until a stopping time T that has an exponential distribution with rate α. Let T be independent of F∞ and let P{T &gt; t } = e−αt . Then the total discounted reward can be deﬁned as T J2 = 0 N(T ) rt d t + Rn , n=1 where N (t) = sup{n : Tn ≤ t }, t ≥ 0. It is well known that E[J1 ] = E[J2 ], (2.1) if at least one side of this equation is well deﬁned (a random variable has a well-deﬁned expectation if either the expectation of its positive part is ﬁnite or the expectation of its negative part is ﬁnite). Indeed, N(T ) ∞ Rn = E n=1 E Rn 1{T ≥ Tn } n=1 ∞ = E E[Rn 1{T ≥ Tn } | FTn ] n=1 ∞ =E Rn E[1{T ≥ Tn } | FTn ] n=1 ∞ =E Rn P{T ≥ Tn | FTn } n=1 ∞ =E n=1 Rn e−αT...
View Full Document

Ask a homework question - tutors are online