Knowledge-Kim - A Knowledge Management System for...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
A Knowledge Management System for Organizing MEDLINE Database Hyunki Kim, Su-Shing Chen Computer and Information Science Engineering Department, University of Florida, Gainesville, Florida 32611, USA With the explosion of biomedical data, information overload and users’ inability of expressing their information needs may become more serious. To solve those problems, this paper presents a text data mining method that uses both text categorization and text clustering for building concept hierarchies for MEDLINE citations. The approach we propose is a three-step data mining process for organizing MEDLINE database: (1) categorizations according to MeSH terms, MeSH major topics, and the co- occurrence of MeSH descriptors; (2) clustering using the results of MeSH term categorization; and (3) visualization of categories and hierarchical clusters. The hierarchies automatically generated may be used to support users in browsing behavior and help them identify good starting points for searching. An interface for this underlying system is also presented. 1. INTRODUCTION MEDLINE, developed by the U.S. National Library of Medicine (NLM), is a database of indexed bibliographic citations and abstracts. It contains over 4,600 biomedical journals [1]. MEDLINE citations and abstracts are searchable via PubMed or the NLM Gateway. The NLM produces the MeSH (Medical Subject Headings) for the purposes of subject indexing, cataloging and searching journal articles in MEDLINE with an annual update cycle. MeSH consists of descriptors (or main headings), qualifiers (or subheadings), and supplementary concept records. It contains more than 19,000 descriptors which are used to describe the subject topic of an article. It also provides less than 100 qualifiers which are used to express a certain aspect of the concept represented by the descriptor. MeSH terms are arranged both alphabetically and in a hierarchical tree, in which specific subject categories are arranged beneath broader terms. MeSH terms provide a consistent way of retrieving information regardless of different terminology used by the authors in the original articles. By using MeSH terms, the user is able to narrow the search space in MEDLINE. As a result, by adding more MeSH terms to the query, retrieval performance may be improved [2]. However, there are inherent challenges, as well. There may be information overload [3], and users may be unable to express their information needs, in order to take full advantage of the MEDLINE database. MEDLINE contains over 12 million article citations. Beginning in 2002, it began to add over 2,000 new references on a daily basis [1]. Although the user may be able to limit the search space of MEDLINE with MeSH terms, keyword searches often result in a long list of results. For instance, when the user queries the term “Parkinson’s Disease” by limiting it to the MeSH descriptors, PubMed returns over 21,000 results. Here, there is a problem of information overload, with the user having
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/15/2012 for the course COP 4600 taught by Professor Yavuz-kahveci during the Spring '07 term at University of Florida.

Page1 / 5

Knowledge-Kim - A Knowledge Management System for...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online