tree classifiers for prostrate cancer diagnosis

tree classifiers for prostrate cancer diagnosis -...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
BIOINFORMATICS ORIGINAL PAPER Vol. 25 no. 1 2009, pages 54–60 doi:10.1093/bioinformatics/btn354 Genetics and population analysis The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis James A. Koziol 1 , Anne C. Feng 1 , Zhenyu Jia 2 , Yipeng Wang 2 , 3 , Seven Goodison 4 , Michael McClelland 3 and Dan Mercola 2 , 1 The Scripps Research Institute, La Jolla, 2 Translational Cancer Biology, Department of Pathology and Laboratory Medicine, University of California, Irvine, 3 The Sidney Kimmel Cancer Center, San Diego, CA and 4 Department of Surgery, University of Florida, Shands Health Science Center, Jacksonville, FL, USA Received on April 3, 2008; revised on July 9, 2008; accepted on July 10, 2008 Advance Access publication July 15, 2008 Associate Editor: Martin Bishop ABSTRACT Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu 1 INTRODUCTION AND SUMMARY Classification and regression trees (CART; Breiman et al ., 1984) have long been used for cancer diagnosis and prognosis (Dillman and Koziol, 1983; Koziol et al ., 2003). Extensions of the CART methodology to survival analyses (Gordon and Olshen, 1985; LeBlanc and Crowley, 1992; Segal, 1988) are readily available for prediction of survival probabilities. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers (Bühlmann, 2004) can ameliorate these difficulties. In related contexts, ensemble techniques tend to reduce error variances and increase the robustness of findings (Bhanot et al ., 2006); we might therefore hope that combining the results of several individual trees will yield results more reliable and potentially less biased than any particular tree. For clarity, we consider a concrete problem, prediction of times to progression with data from two recent studies of radical prostatectomy in prostate cancer (Stephenson et al ., 2005; Yu et al ., 2004). Independently of these studies, we have five mutually exclusive gene lists of cardinality 23–100, each putatively associated with prostate cancer diagnosis or prognosis. For each To whom correspondence should be addressed.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 7

tree classifiers for prostrate cancer diagnosis -...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online