Unformatted text preview: ave to be connected to unit k This This makes the algorithm non-local nonECE 517: Reinforcement Learning in AI 20 ∂yk (t ) / ∂wij RTRL RTRL - Derivation of From the previous equations we have ∂yk (t + 1) ∂zl (t ) ' = f k (net k (t )) ∑ wkl + δ ik z j (t ) ∂wij ∂wij l∈U ∪ I where δ ik is the Kronecker delta 1 δ ik = 0 if i = k if i ≠ k Since input signals do not depend on the weights in the network, the top equation becomes ∂yk (t + 1) ∂yl (t ) ' = f k (net k (t )) ∑ wkl + δ ik z j (t ) ∂wij ∂wij l∈U ∪ I ECE 517: Reinforcement Learning in AI 21 RTRL - Derivation of ∂yk (t ) / ∂wij (cont.) Because Because we assume that our starting state (t = 0) is independent of the weights, we have ∂yk (t0 ) =0 ∂wij We We therefore need to define the sensitivity values sensitivity ∂yk (t ) p (t ) = ∂wij k ij for every time step and all appropriate i, j and k. We start i, with the initial condition k pij (0) = 0 ECE 517: Reinforcement Learning in AI 22 RTRL RTRL – final formalism … and compute at each time step l p (t + 1) = f (net k (t )) ∑ wkl pij (t ) + δ ik z j (t ) l∈U ∪ I k ij ' k The The algorithm then consists of computing sensitivity values, values, and then using the computing weight changes k ∆wij (t ) = µ ∑ ek (t ) pij (t ) k∈U and the overall correction to be applied to wij is given by ∆wij (t ) = ECE 517: Reinforcement Learning in AI t1 ∑ ∆w (t ) t =t 0 +1 ij 23 Summary Summary Recurrent neural networks have the potential to represent spatiospatio-temporal information They are much more complicated to work with (train) … Computation Computation Storage Storage Time Time RTRL – no need for storing history, but… Requires Requires O(N4) computations Requires Requires O(N3) storage ECE 517: Reinforcement Learning in AI 24...
