A guide to recurrent neural networks and backpropagation
Mikael Bod´
en
*
[email protected]
School of Information Science, Computer and Electrical Engineering
Halmstad University.
November 13, 2001
Abstract
This paper provides guidance to some of the concepts surrounding recurrent neural
networks. Contrary to feedforward networks, recurrent networks can be sensitive, and be
adapted to past inputs. Backpropagation learning is described for feedforward networks,
adapted to suit our (probabilistic) modeling needs, and extended to cover recurrent net
works.
The aim of this brief paper is to set the scene for applying and understanding
recurrent neural networks.
1
Introduction
It is well known that conventional feedforward neural networks can be used to approximate
any
spatially finite function given a (potentially very large) set of hidden nodes.
That
is, for functions which have a
fixed
input space there is always a way of encoding these
functions as neural networks.
For a twolayered network, the mapping consists of two
steps,
y
(
t
) =
G
(
F
(
x
(
t
)))
.
(1)
We can use automatic learning techniques such as backpropagation to find the weights of
the network (
G
and
F
) if sufficient samples from the function is available.
Recurrent neural networks are fundamentally different from feedforward architectures
in the sense that they not only operate on an input space but also on an internal
state
space – a trace of what already has been processed by the network. This is equivalent
to an Iterated Function System (IFS; see (Barnsley, 1993) for a general introduction to
IFSs; (Kolen, 1994) for a neural network perspective) or a Dynamical System (DS; see
e.g. (Devaney, 1989) for a general introduction to dynamical systems; (Tino et al., 1998;
Casey, 1996) for neural network perspectives). The state space enables the representation
(and learning) of temporally/sequentially extended dependencies over unspecified (and
potentially infinite) intervals according to
y
(
t
) =
G
(
s
(
t
))
(2)
s
(
t
) =
F
(
s
(
t

1)
, x
(
t
))
.
(3)
*
This document was mainly written while the author was at the Department of Computer Science, Univer
sity of Sk¨ovde.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
To limit the scope of this paper and simplify mathematical matters we will assume
that the network operates in discrete time steps (it is perfectly possible to use continuous
time instead). It turns out that if we further assume that weights are at least rational
and continuous output functions are used, networks are capable of representing
any
Tur
ing Machine (again assuming that any number of hidden nodes are available).
This is
important since we then know that all that can be computed, can be processed
1
equally
well with a discrete time recurrent neural network. It has even been suggested that if real
weights are used (the neural network is completely analog) we get superTuring Machine
capabilities (Siegelmann, 1999).
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '09
 Artificial Intelligence, Neural Networks, Artificial neural network, neural network, Recurrent Neural Networks, Simple Recurrent Network

Click to edit the document details