This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: A guide to recurrent neural networks and backpropagation Mikael Bod en * mikael.boden@ide.hh.se School of Information Science, Computer and Electrical Engineering Halmstad University. November 13, 2001 Abstract This paper provides guidance to some of the concepts surrounding recurrent neural networks. Contrary to feedforward networks, recurrent networks can be sensitive, and be adapted to past inputs. Backpropagation learning is described for feedforward networks, adapted to suit our (probabilistic) modeling needs, and extended to cover recurrent net works. The aim of this brief paper is to set the scene for applying and understanding recurrent neural networks. 1 Introduction It is well known that conventional feedforward neural networks can be used to approximate any spatially finite function given a (potentially very large) set of hidden nodes. That is, for functions which have a fixed input space there is always a way of encoding these functions as neural networks. For a twolayered network, the mapping consists of two steps, y ( t ) = G ( F ( x ( t ))) . (1) We can use automatic learning techniques such as backpropagation to find the weights of the network ( G and F ) if sufficient samples from the function is available. Recurrent neural networks are fundamentally different from feedforward architectures in the sense that they not only operate on an input space but also on an internal state space a trace of what already has been processed by the network. This is equivalent to an Iterated Function System (IFS; see (Barnsley, 1993) for a general introduction to IFSs; (Kolen, 1994) for a neural network perspective) or a Dynamical System (DS; see e.g. (Devaney, 1989) for a general introduction to dynamical systems; (Tino et al., 1998; Casey, 1996) for neural network perspectives). The state space enables the representation (and learning) of temporally/sequentially extended dependencies over unspecified (and potentially infinite) intervals according to y ( t ) = G ( s ( t )) (2) s ( t ) = F ( s ( t 1) ,x ( t )) . (3) * This document was mainly written while the author was at the Department of Computer Science, Univer sity of Skovde. 1 To limit the scope of this paper and simplify mathematical matters we will assume that the network operates in discrete time steps (it is perfectly possible to use continuous time instead). It turns out that if we further assume that weights are at least rational and continuous output functions are used, networks are capable of representing any Tur ing Machine (again assuming that any number of hidden nodes are available). This is important since we then know that all that can be computed, can be processed 1 equally well with a discrete time recurrent neural network. It has even been suggested that if real weights are used (the neural network is completely analog) we get superTuring Machine capabilities (Siegelmann, 1999)....
View
Full
Document
This document was uploaded on 10/24/2011.
 Fall '09

Click to edit the document details