# Reinforcement learning in ai 20 yk t wij rtrl rtrl

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ave to be connected to unit k This This makes the algorithm non-local nonECE 517: Reinforcement Learning in AI 20 ∂yk (t ) / ∂wij RTRL RTRL - Derivation of From the previous equations we have ∂yk (t + 1) ∂zl (t ) ' = f k (net k (t )) ∑ wkl + δ ik z j (t ) ∂wij ∂wij l∈U ∪ I where δ ik is the Kronecker delta 1 δ ik = 0 if i = k if i ≠ k Since input signals do not depend on the weights in the network, the top equation becomes ∂yk (t + 1) ∂yl (t ) ' = f k (net k (t )) ∑ wkl + δ ik z j (t ) ∂wij ∂wij l∈U ∪ I ECE 517: Reinforcement Learning in AI 21 RTRL - Derivation of ∂yk (t ) / ∂wij (cont.) Because Because we assume that our starting state (t = 0) is independent of the weights, we have ∂yk (t0 ) =0 ∂wij We We therefore need to define the sensitivity values sensitivity ∂yk (t ) p (t ) = ∂wij k ij for every time step and all appropriate i, j and k. We start i, with the initial condition k pij (0) = 0 ECE 517: Reinforcement Learning in AI 22 RTRL RTRL – final formalism … and compute at each time step l p (t + 1) = f (net k (t )) ∑ wkl pij (t ) + δ ik z j (t ) l∈U ∪ I k ij ' k The The algorithm then consists of computing sensitivity values, values, and then using the computing weight changes k ∆wij (t ) = µ ∑ ek (t ) pij (t ) k∈U and the overall correction to be applied to wij is given by ∆wij (t ) = ECE 517: Reinforcement Learning in AI t1 ∑ ∆w (t ) t =t 0 +1 ij 23 Summary Summary Recurrent neural networks have the potential to represent spatiospatio-temporal information They are much more complicated to work with (train) … Computation Computation Storage Storage Time Time RTRL – no need for storing history, but… Requires Requires O(N4) computations Requires Requires O(N3) storage ECE 517: Reinforcement Learning in AI 24...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online