Another approach is to use the learning machine

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ckward Elimination PCA, Fisher criterion and t-test construct or rank features independent of the learning machine that does the actual classification and are therefore called filter methods (Blum & Langley, 1997). Another approach is to use the learning machine itself for feature selection. These techniques are called wrapper methods and try to identify the features that are important for the generalisation capability of the machine. Here we propose to use the features that are important for achieving a low training error as a simple approximation. In the case of a SVM with linear kernel Feature selection for DNA methylation these features are easily identified by looking at the normal vector w of the separating hyperplane. The smaller the angle between a feature basis vector and the normal vector the more important is the feature for the separation. Features orthogonal to the normal vector have obviously no influence on the discrimination at all. This means the feature ranking is simply given by the components 2 of the normal vector as wk . Of course this ranking is not very realistic because the SVM solution on the full feature set is far from optimal as we demonstrated in the last subsections. A simple heuristic is to assume that 2 the feature with the smallest wk is really unimportant for the solution and can be safely removed from the feature set. Then the SVM can be retrained on the reduced feature set and the procedure is repeated until the feature set is empty. Such a successive feature removal is called backward elimination (Blum & Langley, 1997). The resulting CpG ranking on our data set is shown in Fig. 2d and differs considerably from the Fisher and t-test rankings. It seems backward elimination is able to remove redundant features. However, as shown in Tab. 1 and Fig. 3 the generalisation results are not better than for the Fisher criterion. Furthermore, backward elimination seems to be more dimension dependent and it is computationally more expensive. It follows that at least for this data set the simple Fisher criterion is the preferable feature selection technique. Exhaustive Search A canonical way to construct a wrapper method for feature selection is to evaluate the generalisation performance of the learning machine on every possible feature subset. Cross-validation on the training set can be used to estimate the generalisation of the machine on a given feature set. What makes this exhaustive search of the feature space practically useless is the enormous number of n=0 n = k k 2n different feature combinations and there are numerous heuristics to search the feature space more efficiently (e.g. backward elimination) (Blum & Langley, 1997). Here we only want to demonstrate that there are no higher order correlations between features and class labels in our data set. In order to do this we exhaustively searched the space of all two feature combinations. For every of the 821 = 3240 two CpG combinations we computed the leave-one-out cross-validation error of a SVM with quadratic kernel on the training set. From all CpG pairs with minimum leave-one-out error we selected the one...
View Full Document

This note was uploaded on 04/06/2010 for the course COMPUTER S COSC1520 taught by Professor Paul during the Spring '09 term at York University.

Ask a homework question - tutors are online