pap_Plant_2006 - Feature Selection on High Throughput...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Feature Selection on High Throughput SELDI-TOF Mass-Spectrometry Data for Identifying Biomarker Candidates in Ovarian and Prostate Cancer Claudia Plant, Melanie Osl, Bernhard Tilg, Christian Baumgartner Research Group for Clinical Bioinformatics, Institute of Biomedical Engeneering University for Health Sciences, Biomedical Informatics and Technology (UMIT), Hall in Tirol, Austria { claudia.plant | melanie.osl | bernhard.tilg | christian.baumgartner } Abstract High-throughput mass-spectrometry screening has the potential of superior results in detecting early stage cancer than traditional biomarkers. Proteomic data poses novel challenges for data mining, especially concerning the curse of dimensionality. In this paper, we present a 3-step fea- ture selection framework combining the advantages of ef- ficient filter and effective wrapper techniques. We demon- strate the performance of our framework on two SELDI- TOF-MS data sets for identifying biomarker candidates in ovarian and prostate cancer. 1. Introduction The identification of putative proteomic marker candi- dates is a big challenge in the biomarker discovery process. Pathologic states within cancer tissue may be expressed by abnormal changes in the protein and peptide abundance. By the availability of modern high throughput techniques such as SELDI-TOF (surface-enhanced laser desorption and ionization time-of-flight) MS a large amount of high dimensional mass spectrometric data is produced from a single blood sample. Each spectrum is composed of peak amplitude measurements at approximately 15,200 features represented by a corresponding m/z value. Proteomic spec- tra potentially contain more information on the abnormal protein signaling and networking than traditional single biomarkers. The widely used cancer antigen 125 (CA125) for instance can only detect 50%-60% of patients with stage I ovarian cancer [8]. The curse of dimensionality severely affects the per- formance of classification algorithms in terms of efficiency and effectiveness on proteomic spectra. Feature transfor- mation techniques can be applied before classification, e.g. as in [13]. To identify previously not discovered marker candidates, however, the transformed features are not useful. Feature selection methods, which try to find the subset of features with the highest discriminatory power, can be applied. Nevertheless, as aforementioned, the use of traditional methods is limited due to the high dimensionality of the data. In this paper, we propose a novel 3-step feature selec- tion framework which combines elements of existing feature selection methods and is accustomed to the special characteristics of high-throughput MS data. We present the results on two published SELDI-TOF-MS data sets on ovarian and prostate cancer. Our method identifies feature subsets with a classification accuracy between 97% and 100%....
View Full Document

This note was uploaded on 02/08/2010 for the course ECEN 689-601 taught by Professor Staff during the Spring '10 term at Texas A&M.

Page1 / 6

pap_Plant_2006 - Feature Selection on High Throughput...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online