biological abbreviation - BMC Bioinformatics Proceedings...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Proceedings BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature Cheng-Ju Kuo 1 ,Maur iceHTL ing 2,4 , Kuan-Ting Lin 1,3 and Chun-Nan Hsu* 1 Addresses: 1 Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Republic of China, 2 School of Chemical and Life Sciences, Singapore Polytechnic, Republic of Singapore, 3 Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan, Republic of China and 4 Department of Zoology, The University of Melbourne, Parkville, Victoria, Australia E-mail: Cheng-Ju Kuo -; Maurice HT Ling -; Kuan-Ting Lin -; Chun-Nan Hsu* - *Corresponding author from Asia Pacific Bioinformatics Network (APBioNet) Eighth International Conference on Bioinformatics (InCoB2009) Singapore 7-11 September 2009 Published: 3 December 2009 BMC Bioinformatics 2009, 10 (Suppl 15):S7 doi: 10.1186/1471-2105-10-S15-S7 This article is available from: © 2009 Kuo et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: To automatically process large quantities of biological literature for knowledge discovery and information curation, text mining tools are becoming essential. Abbreviation recognition is related to NER and can be considered as a pair recognition task of a terminology and its corresponding abbreviation from free text. The successful identification of abbreviation and its corresponding definition is not only a prerequisite to index terms of text databases to produce articles of related interests, but also a building block to improve existing gene mention tagging and gene normalization tools. Results: Our approach to abbreviation recognition (AR) is based on machine-learning, which exploits a novel set of rich features to learn rules from training data. Tested on the AB3P corpus, our system demonstrated a F-score of 89.90% with 95.86% precision at 84.64% recall, higher than the result achieved by the existing best AR performance system. We also annotated a new corpus of 1200 PubMed abstracts which was derived from BioCreative II gene normalization corpus. On our annotated corpus, our system achieved a F-score of 86.20% with 93.52% precision at 79.95% recall, which also outperforms all tested systems. Conclusion: By applying our system to extract all short form-long form pairs from all available PubMed abstracts, we have constructed BIOADI. Mining BIOADI reveals many interesting trends of bio-medical research. Besides, we also provide an off-line AR software in the download section
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/06/2010 for the course COMPUTER S COSC1520 taught by Professor Paul during the Spring '09 term at York University.

Page1 / 11

biological abbreviation - BMC Bioinformatics Proceedings...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online