Unformatted text preview: with the smallest radius margin ratio. This pair
was considered to be the optimal feature combination and
was used to evaluate the generalisation performance of the
SVM on the test set.
The average test error of the exhaustive search method
was with 6% the same as the one of the Fisher criterion
in the case of two features and a quadratic kernel. For ﬁve features the exhaustive computation is already infeasible.
In the absolute majority of cross-validation runs the CpGs
selected by exhaustive search and Fisher criterion were
identical. In some cases suboptimal CpGs were chosen
by the exhaustive search method. These results clearly
demonstrate that there are no second order combinations
of two features in our data set that are important for an
ALL/AML discrimination. We expect that higher than
second order combinations of more than two features can
not be detected reliably with such a limited sample size.
Therefore the Fisher criterion should be able to extract all
classiﬁcation relevant information from our data set. CONCLUSIONS
To achieve reliable predictions on the basis of small
training set sizes the selection of relevant features is
necessary, even for advanced learning algorithms as the
support vector machine. For classiﬁcation tasks where
the class information is directly correlated to single
CpG dinucleotide markers the simple Fisher criterion
is a powerful and efﬁcient feature selection strategy.
For more complex problems it will be necessary to
derive feature selection algorithms that can remove or
combine redundant features and handle higher order
Taken together, our results clearly demonstrate that
microarray based methylation analysis combined with
supervised learning techniques can reliably predict known
tumor classes. Classiﬁcation results were comparable to
mRNA expression data and our results suggest, that
methylation analysis should be applied to other kinds
of tissue. Well documented tissue samples with patient
history can be obtained only as archived specimens.
This strongly limits the amount and number of tissues
available for expression analysis (Bowtell, 1999). The
methylation approach has the potential to overcome this
fundamental limitation: through the mere fact that the
stable DNA is the object of study, extraction of material
is possible form archived samples. This enables the
examination of methylation patterns in large numbers of
archived specimen with comprehensive clinical records
and removes one of the major limitations for the discovery
of complex biological processes by statistical means.
Adorj´ n, P., Distler, J., Lipscher, E., Model, F., M¨ ller, J., Pelet,
C., Braun, A., Florl, A., G¨ tig, D., Grabs, G., Howe, A., Kursar,
M., Lesche, R., Leu, E., Lewin, A., Maier, S., M¨ ller, V., Otto,
T., Scholz, C., Schulz, W., Seifert, H., Schwope, I., Ziebarth, H.,
Berlin, K., Piepenbrock, C. & Olek, A. (2001). Tumour class
prediction and discovery by microarray-based dna methylation
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M.
& Yakhini, Z. (2001). Tiss...
View Full Document