euclid.jap.1261670699

In this case the total discounted rewards are dened

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: atural ways to define the total discounted rewards. One way is to interpret discounting as the coefficient in front of the reward rate. In this case, the total discounted rewards are defined as ∞ J1 = 0 e−αt rt dt + ∞ e−αTn Rn , n=1 where α > 0 is the discount rate. Another way is to define the total discounted rewards as the total rewards until a stopping time T that has an exponential distribution with rate α. Let T be independent of F∞ and let P{T > t } = e−αt . Then the total discounted reward can be defined as T J2 = 0 N(T ) rt d t + Rn , n=1 where N (t) = sup{n : Tn ≤ t }, t ≥ 0. It is well known that E[J1 ] = E[J2 ], (2.1) if at least one side of this equation is well defined (a random variable has a well-defined expectation if either the expectation of its positive part is finite or the expectation of its negative part is finite). Indeed, N(T ) ∞ Rn = E n=1 E Rn 1{T ≥ Tn } n=1 ∞ = E E[Rn 1{T ≥ Tn } | FTn ] n=1 ∞ =E Rn E[1{T ≥ Tn } | FTn ] n=1 ∞ =E Rn P{T ≥ Tn | FTn } n=1 ∞ =E n=1 Rn e−αT...
View Full Document

Ask a homework question - tutors are online