[doi 10.1007%2F978-3-642-23151-3] Holmes, Dawn E.; Jain, Lakhmi C -- [Intelligent Systems Reference - Dawn E Holmes and Lakhmi C Jain(Eds Data Mining

[doi 10.1007%2F978-3-642-23151-3] Holmes, Dawn E.; Jain, Lakhmi C -- [Intelligent Systems Reference

This preview shows page 1 out of 367 pages.

You've reached the end of your free preview.

Want to read all 367 pages?

Unformatted text preview: Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms Intelligent Systems Reference Library, Volume 25 Editors-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail: [email protected] Prof. Lakhmi C. Jain University of South Australia Adelaide Mawson Lakes Campus South Australia 5095 Australia E-mail: [email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 1. Christine L. Mumford and Lakhmi C. Jain (Eds.) Computational Intelligence: Collaboration, Fusion and Emergence, 2009 ISBN 978-3-642-01798-8 Vol. 2. Yuehui Chen and Ajith Abraham Tree-Structure Based Hybrid Computational Intelligence, 2009 ISBN 978-3-642-04738-1 Vol. 3. Anthony Finn and Steve Scheding Developments and Challenges for Autonomous Unmanned Vehicles, 2010 ISBN 978-3-642-10703-0 Vol. 4. Lakhmi C. Jain and Chee Peng Lim (Eds.) Handbook on Decision Making: Techniques and Applications, 2010 ISBN 978-3-642-13638-2 Vol. 5. George A. Anastassiou Intelligent Mathematics: Computational Analysis, 2010 ISBN 978-3-642-17097-3 Vol. 6. Ludmila Dymowa Soft Computing in Economics and Finance, 2011 ISBN 978-3-642-17718-7 Vol. 7. Gerasimos G. Rigatos Modelling and Control for Intelligent Industrial Systems, 2011 ISBN 978-3-642-17874-0 Vol. 8. Edward H.Y. Lim, James N.K. Liu, and Raymond S.T. Lee Knowledge Seeker – Ontology Modelling for Information Search and Management, 2011 ISBN 978-3-642-17915-0 Vol. 9. Menahem Friedman and Abraham Kandel Calculus Light, 2011 ISBN 978-3-642-17847-4 Vol. 10. Andreas Tolk and Lakhmi C. Jain Intelligence-Based Systems Engineering, 2011 ISBN 978-3-642-17930-3 Vol. 13. Witold Pedrycz and Shyi-Ming Chen (Eds.) Granular Computing and Intelligent Systems, 2011 ISBN 978-3-642-19819-9 Vol. 14. George A. Anastassiou and Oktay Duman Towards Intelligent Modeling: Statistical Approximation Theory, 2011 ISBN 978-3-642-19825-0 Vol. 15. Antonino Freno and Edmondo Trentin Hybrid Random Fields, 2011 ISBN 978-3-642-20307-7 Vol. 16. Alexiei Dingli Knowledge Annotation: Making Implicit Knowledge Explicit, 2011 ISBN 978-3-642-20322-0 Vol. 17. Crina Grosan and Ajith Abraham Intelligent Systems, 2011 ISBN 978-3-642-21003-7 Vol. 18. Achim Zielesny From Curve Fitting to Machine Learning, 2011 ISBN 978-3-642-21279-6 Vol. 19. George A. Anastassiou Intelligent Systems: Approximation by Artificial Neural Networks, 2011 ISBN 978-3-642-21430-1 Vol. 20. Lech Polkowski Approximate Reasoning by Parts, 2011 ISBN 978-3-642-22278-8 Vol. 21. Igor Chikalov Average Time Complexity of Decision Trees, 2011 ISBN 978-3-642-22660-1 Vol. 22. Przemyslaw Róz˙ ewski, Emma Kusztina, Ryszard Tadeusiewicz, and Oleg Zaikin Intelligent Open Learning Systems, 2011 ISBN 978-3-642-22666-3 Vol. 23. Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms, 2012 ISBN 978-3-642-23165-0 Vol. 11. Samuli Niiranen and Andre Ribeiro (Eds.) Information Processing and Biological Systems, 2011 ISBN 978-3-642-19620-1 Vol. 24. Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms, 2012 ISBN 978-3-642-23240-4 Vol. 12. Florin Gorunescu Data Mining, 2011 ISBN 978-3-642-19720-8 Vol. 25. Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms, 2012 ISBN 978-3-642-23150-6 Dawn E. Holmes and Lakhmi C. Jain (Eds.) Data Mining: Foundations and Intelligent Paradigms Volume 3: Medical, Health, Social, Biological and other Applications 123 Prof. Dawn E. Holmes Prof. Lakhmi C. Jain Department of Statistics and Applied Probability University of California, Santa Barbara, CA 93106 USA E-mail: [email protected] Professor of Knowledge-Based Engineering University of South Australia Adelaide Mawson Lakes, SA 5095 Australia E-mail: [email protected] ISBN 978-3-642-23150-6 e-ISBN 978-3-642-23151-3 DOI 10.1007/978-3-642-23151-3 Intelligent Systems Reference Library ISSN 1868-4394 Library of Congress Control Number: 2011936705 c 2012 Springer-Verlag Berlin Heidelberg  This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed on acid-free paper 987654321 springer.com Preface There are many invaluable books available on data mining theory and applications. However, in compiling a volume titled “DATA MINING: Foundations and Intelligent Paradigms: Volume 3: Medical, Health, Social, Biological and other Applications” we wish to introduce some of the latest developments to a broad audience of both specialists and non-specialists in this field. The term ‘data mining’ was introduced in the 1990’s to describe an emerging field based on classical statistics, artificial intelligence and machine learning. By combining techniques from these areas, and developing new ones researchers are able to innovatively analyze large datasets productively. Patterns found in these datasets are subsequently analyzed with a view to acquiring new knowledge. These techniques have been applied in a broad range of medical, health, social and biological areas. In compiling this volume we have sought to present innovative research from prestigious contributors in the field of data mining. Each chapter is self-contained and is described briefly in Chapter 1. This book will prove valuable to theoreticians as well as application scientists/engineers in the area of Data Mining. Postgraduate students will also find this a useful sourcebook since it shows the direction of current research. We have been fortunate in attracting top class researchers as contributors and wish to offer our thanks for their support in this project. We also acknowledge the expertise and time of the reviewers. Finally, we also wish to thank Springer for their support. Dr. Dawn E. Holmes University of California Santa Barbara, USA Dr. Lakhmi C. Jain University of South Australia Adelaide, Australia Contents Chapter 1 Advances in Intelligent Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dawn E. Holmes, Jeffrey W. Tweedale, Lakhmi C. Jain 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Medical Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Health Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Social Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Information Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 On-Line Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Biological Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Biological Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Estimations in Gene Expression . . . . . . . . . . . . . . . . . . . . . . 6 Chapters Included in the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 2 Temporal Pattern Mining for Medical Applications . . . . . . . . . . . . . Giulia Bruno, Paolo Garza 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Types of Temporal Data in Medical Domain . . . . . . . . . . . . . . . . . 3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Temporal Pattern Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . 4.1 Temporal Pattern Mining from a Set of Sequences . . . . . . 4.2 Temporal Pattern Mining from a Single Sequence . . . . . . 5 Medical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 3 BioKeySpotter: An Unsupervised Keyphrase Extraction Technique in the Biomedical Full-Text Collection . . . . . . . . . . . . . . . Min Song, Prat Tanapaisankit 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 2 2 3 3 3 4 4 6 6 9 9 10 11 11 12 14 15 17 18 19 19 VIII Contents 2 3 4 Backgrounds and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Comparison Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 21 23 24 24 25 26 27 Chapter 4 Mining Health Claims Data for Assessing Patient Risk . . . . . . . . . . Ian Duncan 1 What Is Health Risk? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Traditional Models for Assessing Health Risk . . . . . . . . . . . . . . . . 3 Risk Factor-Based Risk Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Enrollment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Claims and Coding Systems . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Interpretation of Claims Codes . . . . . . . . . . . . . . . . . . . . . . . 5 Clinical Identification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Sensitivity-Specificity Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Constructing an Identification Algorithm . . . . . . . . . . . . . . 6.2 Sources of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Construction and Use of Grouper Models . . . . . . . . . . . . . . . . . . . . 7.1 Drug Grouper Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Drug-Based Risk Adjustment Models . . . . . . . . . . . . . . . . . 8 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 29 33 37 39 40 40 49 51 56 56 57 58 61 61 62 62 Chapter 5 Mining Biological Networks for Similar Patterns . . . . . . . . . . . . . . . . Ferhat Ay, G¨ unhan G¨ ulsoy, Tamer Kahveci 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Metabolic Network Alignment with One-to-One Mappings . . . . . 2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Pairwise Similarity of Entities . . . . . . . . . . . . . . . . . . . . . . . 2.4 Similarity of Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Combining Homology and Topology . . . . . . . . . . . . . . . . . . 2.6 Extracting the Mapping of Entities . . . . . . . . . . . . . . . . . . 2.7 Similarity Score of Networks . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Metabolic Network Alignment with One-to-Many Mappings . . . 3.1 Homological Similarity of Subnetworks . . . . . . . . . . . . . . . . 3.2 Topological Similarity of Subnetworks . . . . . . . . . . . . . . . . . 63 63 67 68 69 70 74 76 78 79 80 80 82 83 Contents 3.3 Combining Homology and Topology . . . . . . . . . . . . . . . . . . 3.4 Extracting Subnetwork Mappings . . . . . . . . . . . . . . . . . . . . 4 Significance of Network Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Identification of Alternative Entities . . . . . . . . . . . . . . . . . 4.2 Identification of Alternative Subnetworks . . . . . . . . . . . . . . 4.3 One-to-Many Mappings within and across Major Clades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX 84 84 88 88 89 91 92 93 96 Chapter 6 Estimation of Distribution Algorithms in Gene Expression Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elham Salehi, Robin Gras 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Estimation of Distribution of Algorithms . . . . . . . . . . . . . . . . . . . . 2.1 Model Building in EDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Models with Independent Variables . . . . . . . . . . . . . . . . . . . 2.4 Models with Pair Wise Dependencies . . . . . . . . . . . . . . . . . 2.5 Models with Multiple Dependencies . . . . . . . . . . . . . . . . . . . 3 Application of EDA in Gene Expression Data Analysis . . . . . . . . 3.1 State-of-Art of the Application of EDAs in Gene Expression Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 101 102 103 104 104 105 106 108 110 116 116 Chapter 7 Gene Function Prediction and Functional Network: The Role of Gene Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erliang Zeng, Chris Ding, Kalai Mathee, Lisa Schneper, Giri Narasimhan 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Gene Function Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Functional Gene Network Generation . . . . . . . . . . . . . . . . . 1.3 Related Work and Limitations . . . . . . . . . . . . . . . . . . . . . . . 2 GO-Based Gene Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . 3 Estimating Support for PPI Data with Applications to Function Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Mixture Model of PPI Data . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Function Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Evaluating the Function Prediction . . . . . . . . . . . . . . . . . . . 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 124 125 127 128 129 132 132 133 134 135 137 147 X Contents 4 A Functional Network of Yeast Genes Using Gene Ontology Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Constructing a Functional Gene Network . . . . . . . . . . . . . . 4.3 Using Semantic Similarity (SS) . . . . . . . . . . . . . . . . . . . . . . . 4.4 Evaluating the Functional Gene Network . . . . . . . . . . . . . 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 149 149 150 151 151 158 159 160 Chapter 8 Mining Multiple Biological Data for Reconstructing Signal Transduction Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thanh-Phuong Nguyen, Tu-Bao Ho 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Signal Transduction Network . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Protein-Protein Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Constructing Signal Transduction Networks Using Multiple Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Clustering and Protein-Protein Interaction Networks . . . . 3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Some Results of Yeast STN Reconstruction . . . . . . . . . . . . . . . . . . 5 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 163 164 164 166 167 167 168 169 174 178 180 181 181 Chapter 9 Mining Epistatic Interactions from High-Dimensional Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xia Jiang, Shyam Visweswaran, Richard E. Neapolitan 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Detecting Epistasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 High-Dimensional Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Barriers to Learning Epistasis . . . . . . . . . . . . . . . . . . . . . . . . 2.5 MDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Discovering Epis...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes