biological abbreviation - BMC Bioinformatics Proceedings...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Proceedings BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature Cheng-Ju Kuo 1 ,Maur iceHTL ing 2,4 , Kuan-Ting Lin 1,3 and Chun-Nan Hsu* 1 Addresses: 1 Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Republic of China, 2 School of Chemical and Life Sciences, Singapore Polytechnic, Republic of Singapore, 3 Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan, Republic of China and 4 Department of Zoology, The University of Melbourne, Parkville, Victoria, Australia E-mail: Cheng-Ju Kuo - clarkkuo@iis.sinica.edu.tw; Maurice HT Ling - mauriceling@acm.org; Kuan-Ting Lin - woody@iis.sinica.edu.tw; Chun-Nan Hsu* - chunnan.hsu@iis.sinica.edu.tw *Corresponding author from Asia Pacific Bioinformatics Network (APBioNet) Eighth International Conference on Bioinformatics (InCoB2009) Singapore 7-11 September 2009 Published: 3 December 2009 BMC Bioinformatics 2009, 10 (Suppl 15):S7 doi: 10.1186/1471-2105-10-S15-S7 This article is available from: http://www.biomedcentral.com/1471-2105/10/S15/S7 © 2009 Kuo et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: To automatically process large quantities of biological literature for knowledge discovery and information curation, text mining tools are becoming essential. Abbreviation recognition is related to NER and can be considered as a pair recognition task of a terminology and its corresponding abbreviation from free text. The successful identification of abbreviation and its corresponding definition is not only a prerequisite to index terms of text databases to produce articles of related interests, but also a building block to improve existing gene mention tagging and gene normalization tools. Results: Our approach to abbreviation recognition (AR) is based on machine-learning, which exploits a novel set of rich features to learn rules from training data. Tested on the AB3P corpus, our system demonstrated a F-score of 89.90% with 95.86% precision at 84.64% recall, higher than the result achieved by the existing best AR performance system. We also annotated a new corpus of 1200 PubMed abstracts which was derived from BioCreative II gene normalization corpus. On our annotated corpus, our system achieved a F-score of 86.20% with 93.52% precision at 79.95% recall, which also outperforms all tested systems. Conclusion: By applying our system to extract all short form-long form pairs from all available PubMed abstracts, we have constructed BIOADI. Mining BIOADI reveals many interesting trends of bio-medical research. Besides, we also provide an off-line AR software in the download section
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/06/2010 for the course COMPUTER S COSC1520 taught by Professor Paul during the Spring '09 term at York University.

Page1 / 11

biological abbreviation - BMC Bioinformatics Proceedings...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online