Therefore for the purpose of avoiding overfitting

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: brief introduction In order to obtain a better fit for the training data, we often want to increase the complexity of our RBF Network. By its construction, we know that to change the complexity of a RBF Network, the only way is to add or decrease the number of basis functions. A large number of basis function yields a more complex network. In theory, if we add enough basis functions, the RFB Network would work for any training; however, it doesn't mean this RBF Network model can generalize well. Therefore, for the purpose of avoiding overfitting problem (see Notes below), we only want to increase the number of basis function to certain point, i.e. its optimal level. For the model selection, what we usually do is estimate the training error. After working through the training error, we’ll see that the training error in fact can be decomposed, and one component of training error is called Mean Squared Error (MSE). In the later notes, we will find that our final goal is to get a good estimate of MSE. Moreover, in order to find an optimal model for our data, we select the model with the smallest MSE. Now, let us introduce some notations that we will use in the analysis: - - the prediction model estimated by a RBF network from the training data wikicour senote.com/w/index.php?title= Stat841&pr intable= yes 52/74 10/09/2013 Stat841 - Wiki Cour se Notes - - the real model (not null), and ideally, we want - - the training error - - the testing error - - the Mean Squared Error to be close to Note s Being more complex isn’t always a good thing. Sometime, overfitting (http://en.wikipedia.org/wiki/Overfitting) causes the model to lose its generality. For example in the graph on left hand side, the data points are sampled from the model , where is a linear function, which is shown by the blue line, and is additive Gaussian noise from . The red curve displayed in the graph shows the over- fitted model. Clearly, this over- fitted model only works for any training data, and is useless for any further prediction when new data points are introd...
View Full Document

Ask a homework question - tutors are online