*This preview shows
page 1. Sign up to
view the full content.*

**Unformatted text preview: **RTRL (cont.)
The error function for a single time step is defined as 1
2
E (τ ) = ∑ [ek (τ )]
2 k∈U
The
The error function we wish to minimize is the sum of this
error over all past steps of the network Etotal (t0 , t1 ) = t1 ∑ E (τ )
τ
= t 0 +1 Since the total error is the sum of all previous errors and
the error at this time step, so also, the gradient of the
gradient
total error is the sum of the gradient for this time step
sum
and the gradient for previous steps
ECE 517: Reinforcement Learning in AI 18 RTRL
RTRL (cont.)
Hence, the gradient can be expressed as ∇W Etotal (t0 , t + 1) = ∇W E (t0 , t ) + ∇W E (t + 1)
As a time series is presented to the network, we can
accumulate the values of the gradient, or equivalently, of
the weight changes
We thus keep track of the value ∂E (t )
∆wij (t ) = − µ
∂wij
After the network has been presented with the entire
series, we alter each weight by
t1 ∑ ∆w (t ) t =t 0 +1
ECE 517: Reinforcement Learning in AI ij 19 RTRL
RTRL (cont.)
We therefore need an algorithm that computes ∂E (t )
∂E (t ) ∂yk (t )
∂yk (t )
=∑
= ∑ ek (t )
∂wij
∂wij
k∈U ∂y k (t ) ∂wij
k∈U
at each time step.
Since we know ek(t) at all times (the difference between
our
our targets and outputs), we only need to find a way to
compute
compute the second factor
It is important to understand what the latter expresses …
It
It is essentially a measure of the sensitivity of the
activation unit k at time t to a small change in the value of wij
It
It takes into account the effect of such a change in the
weight over the entire network trajectory from t0 to t
Note
Note that wij does not h...

View Full
Document