*This preview shows
page 1. Sign up
to
view the full content.*

**Unformatted text preview: **TRL (cont.)
Units compute their activations in the now familiar way, by
first computing the weighted sum of their inputs:
weighted net k (t ) = ∑w l∈U ∪ I z (t ) kl l where the only new element in the formula is the
introduction of the temporal index t
Units then apply a non-linear function on their net input yk(t+1) = fk(netk(t))
Usually,
Usually, both hidden and output units will have non-linear
nonactivation functions
Note that external input at time t does not influence the
output of any unit until time t+1. The network is thus a
discrete
discrete dynamical system
ECE 517: Reinforcement Learning in AI 16 RTRL
RTRL (cont.)
Some of the units in U are output units, for which a target
is defined
A target may not be defined in every single time step
For
For example, if we are presenting a string to the network to
be classified as either grammatical or ungrammatical, we may
provide a target only for the last symbol in the string In defining an error over the outputs, therefore, we need
to
to make the error time dependent too
Let T(t) be the set of indices k in U for which there exists
T(t)
a target value dk(t) at time t, so that the error is d k (t ) − yk (t ) if k ∈ T (t )
ek (t ) = otherwise
0
ECE 517: Reinforcement Learning in AI 17 RTRL...

View
Full
Document