Machine+Learning+Neural+and+Statistical+Classification_Part7

# Machine+Learning+Neural+and+Statistical+Classification_Part7...

This preview shows pages 1–3. Sign up to view the full content.

Sec. 7.3] Characterisation of datasets 113 number of binary attributes, and if this is so, the skewness and kurtosis are directly related to each other. However, the statistical measures in this section are generally defined only for continuous attributes. Although it is possible to extend their definitions to include discrete and even categorical attributes, the most natural measures for such data are the information theoretic measures discussed in section 7.3.3. Test statistic for homogeneity of covariances The covariancematricesarefundamental in the theory of linearand quadratic discrimination detailed in Sections 3.2 and 3.3, and the key in understanding when to apply one and not the other lies in the homogeneity or otherwise of the covariances. One measure of the lack of homogeneity of covariances is the geometric mean ratio of standard deviations of the populations of individual classes to the standard deviations of the sample, and is given by (see below). This quantity is related to a testof the hypothesis that all populations have a common covariance structure, i.e. to the hypothesis : which can be tested via Box’s test statistic: log where and and are the unbiased estimators of the th sample covariance matrix and the pooled covariance matrix respectively. This statistic has an asymptotic distribution: and the approximation is good if each exceeds 20, and if and are both much smaller than every . In datasets reported in this volume these criteria are not always met, but the statistic can still be computed, and used as a characteristic of the data. The statistic can be re- expressed as the geometric mean ratio of standard deviations of the individual populations to the pooled standard deviations, via the expression exp The is strictly greater than unity if the covariances differ, and is equal to unity if and only if the M-statistic is zero, i.e. all individual covariance matrices are equal to the pooled covariance matrix. In every dataset that we looked at the statistic is significantly different from zero, in which case the is significantly greater than unity. Mean absolute correlation coefficient, corr.abs The set of correlations between all pairs of attributes give some indication of the interdependence of the attributes, and a measure of that interdependence may be calculated as follows. The correlations between all pairs of attributes are calculated for each class separately. The absolute values of these correlations are averaged over all pairs of attributes and over all classes giving the measure corr.abs which is a measure of interdependence between attributes.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
114 Methods for comparison [Ch. 7 If corr.abs is near unity, there is much redundant information in the attributes and some procedures, such as logistic discriminants, may have technical problems associated with this. Also, CASTLE, for example, may be misled substantially by fitting relationships to the attributes, instead of concentrating on getting right the relationship between the classes
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 20

Machine+Learning+Neural+and+Statistical+Classification_Part7...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online