Machine Learning Neu - Sec 10.5 Performance related To measures theoretical 193 10.5.2 Absolute performance quadratic discriminants In theory

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Sec. 10.5] Performance related To measures: theoretical 193 10.5.2 Absolute performance: quadratic discriminants In theory, quadratic discrimination is the best procedure to use when the data are normally distributed, especially so if the covariances differ. Because it makes very specific distribu- tional assumptions, and so is very efficient for normal distributions, it is inadvisable to use quadratic discrimination for non-normal distributions(a common situation with parametric procedures - they are not robust to departures from the assumptions), and, because it uses many more parameters, it is also not advisable to use quadratic discrimination when the sample sizes are small. We will now relate these facts to our measures for the datasets. The ideal dataset for quadratic discrimination would be a very large, normally dis- tributed, dataset with widely differing covariance matrices. In terms of the measures, ideally we want skewness = 0, kurtosis = 3, and SD ratio much greater than unity. The most normal dataset in our study is the KL digits dataset, as skewness = 0.18 (and this is small), kurtosis = 2.92 (and this is near 3), and, most importantly, SD ratio = 1.97 (and this is much greater than unity). This dataset is nearest ideal, so it is predictable that quadratic discrimination will achieve the lowest error rate. In fact, quadratic discriminants achieve an error rate of 2.5%, and this is only bettered by k-NN with an error rate of 2.0% and by ALLOC80 with an error rate of 2.4%. At the other extreme, the least normal dataset is probably the shuttle dataset, with skewness = 4.4 (very large)), kurtosis = 160.3 (nowhere near 3), and, to make matters worse, the SD ratio = 1.12 (and this is not much greater than unity). Therefore, we can predict that this is the least appropriate dataset for quadratic discrimination, and it is no surprise that quadratic discriminants achieve an error rate of 6.72%, which is worst of all our results for the shuttle dataset. The decision tree methods get error rates smaller than this by a factor of 100! The important proviso should always be borne in mind that there must be enough data to estimate all parameters accurately. 10.5.3 Relative performance: Logdisc vs. DIPOL92 Another fruitful way of looking at the behaviour of algorithms is by making paired compar- isons between closely related algorithms. This extremely useful device is best illustrated by comparing logistic discrimination (Logdisc) and DIPOL92. From their construction, we can see that DIPOL92 and logistic discrimination have exactly the same formal decision procedure in one special case, namely the case of two-class problems in which there is no clustering ( i.e. both classes are “pure”). Where the two differ then, will be in multi-class problems (such as the digits or letters datasets) or in two-class problems in which the classes are impure (such as the Belgian Power dataset). With this in mind, it is of interest to compare the performance of DIPOL92
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 08/10/2011 for the course IT 331 taught by Professor Nevermind during the Spring '11 term at King Abdulaziz University.

Page1 / 20

Machine Learning Neu - Sec 10.5 Performance related To measures theoretical 193 10.5.2 Absolute performance quadratic discriminants In theory

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online