This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Sec. 6.2] Supervised networks for classification 93 non-linear receptive fields in attribute space linear output weights Fig. 6.3: A Radial Basis Function Network. Online vs. Batch Note that both the error (6.6) and the gradient (6.14, 6.15) are a sum over examples. These could be estimated by randomly selecting a subset of examples for inclusion in the sum. In the extreme, a single example might be used for each gradient estimate. This is a Stochastic Gradient method. If a similar strategy is used without random selection, but with the data taken in the order it comes, the method is an Online one. If a sum over all training data is performed for each gradient calculation, then the method is a variety. Online and Stochastic Gradient methods offer a considerable speed advantage if the approximation is serviceable. For problems with large amounts of training data they are highly favoured. However, these approximations cannot be used directly in the conjugate gradient method, because it is built on procedures and theorems which assume that is a given function of which can be evaluated precisely so that meaningful comparisons can be made at nearby arguments. Therefore the stochastic gradient and Online methods tend to be used with simple step-size and momentum methods. There is some work on finding a compromise method (Møller, 1993). 6.2.3 Radial Basis Function networks The radial basis function network consists of a layer of units performing linear or non-linear functions of the attributes, followed by a layer of weighted connections to nodes whose outputs have the same form as the target vectors. It has a structure like an MLP with one hidden layer, except that each node of the the hidden layer computes an arbitrary function of the inputs (with Gaussians being the most popular), and the transfer function of each output node is the trivial identity function. Instead of “synaptic strengths” the hidden layer has parameters appropriate for whatever functions are being used; for example, Gaussian widths and positions. This network offers a number of advantages over the multi layer perceptron under certain conditions, although the two models are computationally equivalent. These advantages include a linear training rule once the locations in attribute space of the non-linear functions have been determined, and an underlying model involving localised functions in the attribute space, rather than the long-range functions occurring in perceptron-based models. The linear learning rule avoids problems associated with local minima; in particular it provides enhanced ability to make statments about the accuracy of 94 Neural networks [Ch. 6 the probabilistic interpretation of the outputs in Section 6.2.2....
View Full Document
This note was uploaded on 08/10/2011 for the course IT 331 taught by Professor Nevermind during the Spring '11 term at King Abdulaziz University.
- Spring '11