Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
11 Conclusions D. Michie (1), D. J. Spiegelhalter (2) and C. C.Taylor (3) (1) University of Strathclyde, (2) MRC Biostatistics Unit, Cambridge and (3) University of Leeds 11.1 INTRODUCTION In this chapter we try to draw together the evidence of the comparative trials and subsequent analyses, comment on the experiences of the users of the algorithms, and suggest topics and areas which need further work. We begin with some comments on each of the methods. It should be noted here that our commentsareoften directed towards a specificimplementation of a method rather than the method per se . In some instances the slowness or otherwise poor performance of an algorithm is due at least in part to the lack of sophistication of the program. In addition to the potential weakness of the programmer, there is the potential inexperience of the user. To give an example, the trials of reported on previous chapters were based on a version programmed in LISP. A version is now available in the C language which cuts the CPU time by a factor of 10. In terms of error rates, observed differences in goodness of result can arise from 1. different suitabilities of the basic methods for given datasets 2. different sophistications of default procedures for parameter settings 3. different sophistication of the program user in selection of options and tuning of parameters 4. occurrence and effectiveness of pre-processing of the data by the user The stronger a program in respect of 2, then the better buffered against shortcomings in 3. Alternatively, if there are no options to select or parameters to tune, then item 3 is not important. We give a general viewof the ease-of-use and the suitable applications of thealgorithms. Some of the properties are subject to different interpretations. For example, in general a decision tree is considered to be less easy to understand than decision rules. However, both are much easier to understand than a regression formula which contains only coefficients, and some algorithms do not give any easily summarised rule at all (for example, k-NN). Address for correspondence : Department of Statistics, University of Leeds, Leeds LS2 9JT, U.K.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
214 Conclusions [Ch. 11 The remaining sections discuss more general issues that have been raised in the trials, such as time and memory requirements, the use of cost matrices and general warnings on the interpretation of our results. 11.1.1 User’s guide to programs Here we tabulate some measures to summarise each algorithm. Some are subjective quantities based on the user’s perception of the programs used in StatLog , and may not hold for other implementations of the method. For example, many of the classical statistical algorithms can handle missing values, whereas those used in this project could not. This would necessitate a “front-end” to replace missing values before running the algorithm. Similarly, all of these programs should be able to incorporate costs into their classification
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 20


This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online