This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Identification of Combination Gene Sets for Glioma Classification 1 Seungchan Kim, Edward R. Dougherty, Ilya Shmulevich, Kenneth R. Hess, Stanley R. Hamilton, Jeffrey M. Trent, Gregory N. Fuller, and Wei Zhang 2 Department of Electrical Engineering, Texas A&M University, College Station, Texas 77840 [S. K., E. R. D.]; Departments of Pathology [E. R. D., I. S., S. R. H., G. N. F., W. Z.] and Biostatistics [K. R. H.], The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030; and Cancer Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, Maryland 20892-4470 [S. K., J. M. T.] Abstract One goal for the gene expression profiling of cancer tissues is to identify signature genes that robustly distinguish different types or grades of tumors. Such signature genes would ideally provide a molecular basis for classification and also yield insight into the molecular events underlying different cancer phenotypes. This study applies a recently developed algorithm to identify not only single classifier genes but also gene sets (combinations) for use as glioma classifiers. Classifier genes identified by this algorithm are shown to be strong features by conservatively and collectively considering the misclassification errors of the feature sets. Applying this approach to a test set of 25 patients, we have identified the best single genes and two- to three-gene combinations for distinguishing four types of glioma: ( a ) oligodendroglioma; (b) anaplastic oligodendroglioma; ( c ) anaplastic astrocytoma; and ( d ) glioblastoma multiforme. Some of the identified genes, such as insulin-like growth factor-binding protein 2, have been confirmed to be associated with one of the tumor types. Using combinations of genes, the classification error rate can be significantly lowered. In many instances, neither of the individual genes of a two-gene set performs well as an accurate classifier, but the combination of the two genes forms a robust classifier with a small error rate. Two-gene and three-gene combinations thus provide robust classifiers possessing the potential to translate expression microarray results into diagnostic histopathological assays for clinical utilization. Introduction Current estimates suggest that there are approximately 30,00040,000 genes in the human genome (1, 2), and sub- sets of those genes are expressed in different cell types and in different cellular states. The combination of expressed genes at different levels determines the overall physiology of the cell. Two primary goals of functional genomics are to screen for, from amid the massive amount of transcripto- nomic data generated by high-throughput cDNA microarray technology, the key genes and gene combinations that ex- plain specific cellular phenotypes ( e.g., disease) on a mech- anistic level and to use this data to classify diseases on a molecular level (37)....
View Full Document
- Spring '10