*This preview shows
page 1. Sign up
to
view the full content.*

**Unformatted text preview: **ave to be connected to unit k
This
This makes the algorithm non-local
nonECE 517: Reinforcement Learning in AI 20 ∂yk (t ) / ∂wij RTRL
RTRL - Derivation of From the previous equations we have ∂yk (t + 1)
∂zl (t )
'
= f k (net k (t )) ∑ wkl
+ δ ik z j (t ) ∂wij
∂wij
l∈U ∪ I where δ ik is the Kronecker delta
1
δ ik = 0 if i = k
if i ≠ k Since input signals do not depend on the weights in the
network, the top equation becomes ∂yk (t + 1)
∂yl (t )
'
= f k (net k (t )) ∑ wkl
+ δ ik z j (t ) ∂wij
∂wij
l∈U ∪ I ECE 517: Reinforcement Learning in AI 21 RTRL - Derivation of ∂yk (t ) / ∂wij (cont.) Because
Because we assume that our starting state (t = 0) is
independent of the weights, we have ∂yk (t0 )
=0
∂wij
We
We therefore need to define the sensitivity values
sensitivity ∂yk (t )
p (t ) =
∂wij
k
ij for every time step and all appropriate i, j and k. We start
i,
with the initial condition
k
pij (0) = 0 ECE 517: Reinforcement Learning in AI 22 RTRL
RTRL – final formalism
… and compute at each time step l
p (t + 1) = f (net k (t )) ∑ wkl pij (t ) + δ ik z j (t )
l∈U ∪ I k
ij '
k The
The algorithm then consists of computing sensitivity
values,
values, and then using the computing weight changes
k
∆wij (t ) = µ ∑ ek (t ) pij (t )
k∈U and the overall correction to be applied to wij is given by ∆wij (t ) =
ECE 517: Reinforcement Learning in AI t1 ∑ ∆w (t ) t =t 0 +1 ij 23 Summary
Summary
Recurrent neural networks have the potential to represent
spatiospatio-temporal information
They are much more complicated to work with (train) …
Computation
Computation
Storage
Storage
Time
Time RTRL – no need for storing history, but…
Requires
Requires O(N4) computations
Requires
Requires O(N3) storage ECE 517: Reinforcement Learning in AI 24...

View
Full
Document