Machine+Learning+Neural+and+Statistical+Classification_Part8

Machine+Learning+Neural+and+Statistical+Classification_Part8...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Sec. 9.2] Credit data 133 Table 9.2: Results for the Credit management dataset (2 classes, 7 attributes, (train, test)= (15 000, 5 000) observations). Max. Time (sec.) Error Rate Algorithm Storage Train Test Train Test Rank Discrim 68 32.2 3.8 0.031 0.033 13 Quadisc 71 67.2 12.5 0.051 0.050 21 Logdisc 889 165.6 14.2 0.031 0.030 8 SMART 412 27930.0 5.4 0.021 0.020 1 ALLOC80 220 22069.7 * 0.033 0.031 10 k-NN 108 124187.0 968.0 0.028 0.088 22 CASTLE 48 370.1 81.4 0.051 0.047 19 CART FD FD FD FD FD IndCART 1656 423.1 415.7 0.010 0.025 6 NewID 104 3035.0 2.0 0.000 0.033 13 7250 5418.0 3607.0 0.000 0.030 8 Baytree 1368 53.1 3.3 0.002 0.028 7 NaiveBay 956 24.3 2.8 0.041 0.043 16 CN2 2100 2638.0 9.5 0.000 0.032 12 C4.5 620 171.0 158.0 0.014 0.022 3 ITrule 377 4470.0 1.9 0.041 0.046 18 Cal5 167 553.0 7.2 0.018 0.023 4 Kohonen 715 * * 0.037 0.043 16 DIPOL92 218 2340.0 57.8 0.020 0.020 1 Backprop 148 5950.0 3.0 0.020 0.023 4 RBF 253 435.0 26.0 0.033 0.031 10 LVQ 476 2127.0 52.9 0.024 0.040 15 Default * * * 0.051 0.047 19 of 5.8% on the supplied data but only 2.35% on the dataset with proper class proportions, whereas linear discriminants obtained an error rate of 5.4% on the supplied data and 2.35% on the modified proportions. (The supplier of the credit management dataset quotes error rates for neural nets and decision trees of around 5–6% also when trained on the 50-50 dataset). Note that the effective bias is in favour of the non-statistical algorithms here, as statistical algorithms can cope, to a greater or lesser extent, with prior class proportions that differ from the training proportions. In this dataset the classes were chosen by an expert on the basis of the given attributes (see below) and it is hoped to replace the expert by an algorithm rule in the future. All attribute values are numeric. The dataset providers supplied the performance figures for algorithms which have been applied to the data drawn from the same source.Note that the figures given in Table 9.1 were achieved using the original dataset with equal numbers of examples of both classes. The best results (in terms of error rate) were achieved by SMART, DIPOL92 and the tree algorithms C4.5 and Cal5. SMART is very time consuming to run: however, with credit type datasets small improvements in accuracy can save vast amounts of money so
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
134 Dataset descriptions and results [Ch. 9 this has to be considered if sacrificing accuracy for time. k-NN did badly due to irrelevant attributes; with a variable selection procedure, it obtained an error rate of 3.1%. CASTLE, Kohonen, ITrule and Quadisc perform poorly (the result for Quadisc equalling the default rule). CASTLE uses only attribute 7 to generate the rule, concluding that this is the only relevant attribute for the classification. Kohonen works best for datasets with equal class distributions which is not the case for the dataset as preprocessed here. At the cost of significantly increasing the CPU time, the performance might be improved by using a
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 08/10/2011 for the course IT 331 taught by Professor Nevermind during the Spring '11 term at King Abdulaziz University.

Page1 / 20

Machine+Learning+Neural+and+Statistical+Classification_Part8...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online