featureSelectionDNAMethyCancerClassification_01bioinfo

Here we only want to demonstrate that there are no

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: with the smallest radius margin ratio. This pair was considered to be the optimal feature combination and was used to evaluate the generalisation performance of the SVM on the test set. The average test error of the exhaustive search method was with 6% the same as the one of the Fisher criterion in the case of two features and a quadratic kernel. For five features the exhaustive computation is already infeasible. In the absolute majority of cross-validation runs the CpGs selected by exhaustive search and Fisher criterion were identical. In some cases suboptimal CpGs were chosen by the exhaustive search method. These results clearly demonstrate that there are no second order combinations of two features in our data set that are important for an ALL/AML discrimination. We expect that higher than second order combinations of more than two features can not be detected reliably with such a limited sample size. Therefore the Fisher criterion should be able to extract all classification relevant information from our data set. CONCLUSIONS To achieve reliable predictions on the basis of small training set sizes the selection of relevant features is necessary, even for advanced learning algorithms as the support vector machine. For classification tasks where the class information is directly correlated to single CpG dinucleotide markers the simple Fisher criterion is a powerful and efficient feature selection strategy. For more complex problems it will be necessary to derive feature selection algorithms that can remove or combine redundant features and handle higher order feature dependencies. Taken together, our results clearly demonstrate that microarray based methylation analysis combined with supervised learning techniques can reliably predict known tumor classes. Classification results were comparable to mRNA expression data and our results suggest, that methylation analysis should be applied to other kinds of tissue. Well documented tissue samples with patient history can be obtained only as archived specimens. This strongly limits the amount and number of tissues available for expression analysis (Bowtell, 1999). The methylation approach has the potential to overcome this fundamental limitation: through the mere fact that the stable DNA is the object of study, extraction of material is possible form archived samples. This enables the examination of methylation patterns in large numbers of archived specimen with comprehensive clinical records and removes one of the major limitations for the discovery of complex biological processes by statistical means. REFERENCES Adorj´ n, P., Distler, J., Lipscher, E., Model, F., M¨ ller, J., Pelet, a u C., Braun, A., Florl, A., G¨ tig, D., Grabs, G., Howe, A., Kursar, u M., Lesche, R., Leu, E., Lewin, A., Maier, S., M¨ ller, V., Otto, u T., Scholz, C., Schulz, W., Seifert, H., Schwope, I., Ziebarth, H., Berlin, K., Piepenbrock, C. & Olek, A. (2001). Tumour class prediction and discovery by microarray-based dna methylation analysis. Submitted. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. & Yakhini, Z. (2001). Tiss...
View Full Document

Ask a homework question - tutors are online