These so-called normal equations always have at least one solution and, whenAisinvertible, exactly one. Furthermore, the matrixAis positive-definite, which has theimplication that the discovered solution in the unique case is a global minimum.This algorithm is known as least squares method.The calculation time for setting up the matrixAgrows withH(N⋅n2) and the timefor solving the system of equations asO(n3). This method can be extended verysimply to incorporate multiple output neurons because, as already mentioned, fortwo-layer feedforward networks the output neurons are independent of each other.9.4.2Application to the Appendicitis DataAs an application we now determine a linear score for the appendicitis diagnosis.From the LEXMEDproject data, which is familiar from Sect.8.4.5, we use the leastsquares method to determine a linear mapping from symptoms to the continuous classvariablesAppScorewith values in the interval [0, 1] and obtain the linear combinationAppScore¼0:00085Age±0:125Sexþ0:025P1Qþ0:035P2Q±0:021P3Q±0:025P4Qþ0:12TensLocþ0:031TensGloþ0:13Loslþ0:081Convþ0:0034RectSþ0:0027TAxiþ0:0031TRecþ0:000021Leuko±0:11Diab±1:83:This function returns continuous variables forAppScore, although the actual binaryclass variableApponly takes on the values 0 and 1. Thus we have to decide on athreshold value, as with the perceptron. The classification error of the score as afunction of the threshold is listed in Fig.9.14on page 266 for the training data and thetest data. We clearly see that both curves are nearly the same and have their minimumatH¼0:5. In the small difference of the two curves we see that overfitting is not aproblem for this method because the model generalizes from the test data very well.9.4Linear Networks with Minimal Errors265

Also in thefigure is the result for the nonlinear, three-layer RProp network(Sect.9.5) with a somewhat lower error for threshold values between 0.2 and 0.9.For practical application of the derived score and the correct determination of thethresholdHit is important to not only look at the error, but also to differentiate bytype of error (namely false positive and false negative), as is done in the LEXMEDapplication in Fig.7.10on page 156. In the ROC curve shown there, the scorecalculated here is also shown. We see that the simple linear model is clearly inferiorto the LEXMEDsystem. Evidently, linear approximations are not powerful enoughfor many complex applications.9.4.3The Delta RuleLeast squares is, like the perceptron and decision tree learning, a so-calledbatchlearning algorithm, as opposed toincremental learning. In batch learning, alltraining data must be learned in one run. If new training data is added, it cannotsimply be learned in addition to what is already there. The whole learning processmust be repeated with the enlarged set. This problem is solved by incrementallearning algorithms, which can adapt the learned model to each additional newexample. In the algorithms we will look at in the following discussion, we will

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 365 pages?

Upload your study docs or become a

Course Hero member to access this document

Term

Spring

Professor

N/A

Tags