This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Sec. 4.3] Density estimation 33 This criterion makes the smoothness data dependent, leads to an algorithm for an arbi- trary dimensionality of the data and possesses consistency requirements as discussed by Aitchison & Aitken (1976). An extension of the above model for is to make dependent on the th nearest neighbour distance to , so that we have a for each sample point. This gives rise to the so-called variable kernel model. An extensive description of this model was first given by Breiman et al. (1977). This method has promising results especially when lognormal or skewed distributions are estimated. The kernel width is thus proportional to the th nearest neighbour distance in denoted by , i.e. . We take for the euclidean distance measured after standardisation of all variables. The proportionality factor is (inversely) dependent on . The smoothing value is now determined by two parameters, and ; can be though of as an overall smoothing parameter, while defines the variation in smoothness of the estimated density over the different regions. If, for example , the smoothness will vary locally while for larger values the smoothness tends to be constant over large regions, roughly approximating the fixed kernel model. We use a Normal distribution for the component exp To optimise for and the jackknife modification of the maximum likelihood method can again be applied . However, for the variable kernel this leads to a more difficult two- dimensional optimisation problem of the likelihood function with one continuous parameter ( ) and one discrete parameter ( ). Silverman (1986, Sections 2.6 and 5.3) studies the advantages and disadvantages of this approach. He also proposes another method to estimate the smoothing parameters in a variable kernel model (see Silverman, 1986 and McLachlan, 1992 for details). The algorithm we mainly used in our trials to classify by density estimation is ALLOC80 by Hermans at al. (1982) (see Appendix B for source). 4.2.1 Example We illustrate the kernel classifier with some simulated data, which comprise 200 obser- vations from a standard Normal distribution (class 1, say) and 100 (in total) values from an equal mixture of (class 2). The resulting estimates can then be used as a basis for classifying future observations to one or other class. Various scenarios are given in Figure 4.1 where a black segment indicates that observations will be allocated to class 2, and otherwise to class 1. In this example we have used equal priors for the 2 classes (although they are not equally represented), and hence allocations are based on maximum estimated likelihood. It is clear that the rule will depend on the smoothing parameters, and can result in very disconnected sets. In higher dimensions these segments will become regions, with potentially very nonlinear boundaries, and possibly disconnected, depending on the smoothing parameters used. For comparison we also draw the population probability densities, and the “true” decision regions in Figure 4.1 (top), which are still disconnecteddensities, and the “true” decision regions in Figure 4....
View Full Document
- Spring '11
- Conditional Probability, Probability distribution, probability density function, modern statistical techniques