featureSelectionDNAMethyCancerClassification_01bioinfo

Ratios for the two signals were calculated based on

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: changes was determined using artificially up- and down methylated DNA fragments mixed at different ratios. For each of those mixtures, a series of experiments was conducted to define the range of CG/TG ratios that corresponds to varying degrees of methylation at each of the CpG sites tested. In Fig. 1a results for two CpG positions located in exon 14 of the human factor VIII gene are shown as examples. For the mixtures of 3:0, 2:1, 1:2 and 0:3 the degree of methylation of the individual CpG sites could safely be distinguished. To verify the detection of methylation changes in the real data set two X-chromosomal genes were included in the gene set. Because one of the two X-chromosomes in females becomes inactivated by methylation we can expect a higher degree of methylation of X-chromosomal genes in females compared to males. In Fig. 1b CpGs are ranked according to the significance of the differS158 ence between male and female methylation levels. As expected, the X-chromosomal genes (ELK1, AR) show a significantly higher methylation for females. This clearly demonstrates that the method really detects changes in methylation. SUPPORT VECTOR MACHINES In our case, the task of cancer classification consists of constructing a machine that can predict the leukemia subtype (ALL or AML) from a patients methylation pattern. For every patient sample this pattern is given as G a vector of average† log C G ratios at 81 CpG positions. T Based on a given set of training examples X = {xi : xi ∈ R n } with known diagnosis Y = { y i : y i ∈ { AL L , AM L }} a discriminant function f : R n → { AL L , AM L }, where n is the number of CpGs, has to be learned. The number of misclassifications of f on the training set { X , Y } is called training error and is usually minimised by the learning machine during the training phase. However, what is of practical interest is the capability to predict the class of previously unseen samples, the so called generalisation performance of the learning machine. This performance is usually estimated by the test error, which is the number of misclassifications on an independent test set { X , Y }. The major problem of training a learning machine with good generalisation performance is to find a discriminant function f which on the one hand is complex enough to capture the essential properties of the data distribution, but which on the other hand avoids over-fitting the data. The Support Vector Machine (SVM) tries to solve this problem by constructing a linear discriminant that separates the training data and maximises the distance to the nearest points of the training set. This maximum margin separating hyperplane minimises the ratio between the radius of the minimum enclosing sphere of the training set and the margin between hyperplane and training points. This corresponds to minimising the so called radius margin bound on the expected probability of a test error and promises good generalisation performance (Vapnik, 1998). Of course there are more complex classification problems, where the dependence between class labels y i and features xi is not linear and the training set can not be separated by a hyperplane. In order to allow for non-linear discriminant functions the input space can b...
View Full Document

This note was uploaded on 04/06/2010 for the course COMPUTER S COSC1520 taught by Professor Paul during the Spring '09 term at York University.

Ask a homework question - tutors are online