Linking+PUBMED+to+GO-SVM - Automated linking PUBMED...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Automated linking PUBMED documents with GO terms using SVM S U -S HING C HEN * AND H YUNKI K IM Computer and Information Science and Engineering Department University of Florida, Gainesville, Florida 32611, USA * To whom correspondence should be addressed. Abstract Summary: We have developed an automated linking scheme for PUBMED citations with GO terms using SVM (Support Vector Machine), a classification algorithm. The PUBMED database has been essential to life science researchers with over 12 million citations. More recently GO (Gene Ontology) has provided a graph structure for biological process, cellular component, and molecular function of genomic data. By text mining the textual content of PUBMED and associating them with GO terms, we have built up an ontological map for these databases so that users can search PUBMED via GO terms and conversely GO entries via PUBMED classification. Consequently, some interesting and unexpected knowledge may be captured from them for further data analysis and biological experimentation. This paper reports our results on SVM implementation and the need to parallelize for the training phase. Availability: PUBMED/GO linking software will be available upon request. Contact: [email protected] 1 Introduction With the exponential growth of biomedical data, life science researchers have met a new challenge - how to exploit systematically the relationships between genes, sequences and the biomedical literature [1]. Usually most of known genes are found in the biomedical literature and PUBMED is a worthy database for this kind of information. PUBMED, developed by the U.S. National Library of Medicine (NLM), is a database of indexed bibliographic citations and abstracts [2]. It contains over 4,600 biomedical journals. PUBMED citations and abstracts are searchable via PUBMED 1 or the NLM Gateway 2 . The biomedical literature has much to say about gene sequence, but it also seems that sequence can tell us much about the biomedical literature. Currently, highly trained biologists read the literature and manually select appropriate Gene Ontology (GO) terms to annotate the literature with GO terms. Gene Ontology database has more recently been created to provide an ontological graph structure for biological process, cellular component, and molecular function of genomic data [3]. McCray et al. [4] show that the GO is suitable as a resource for natural language processing (NLP) applications because a large percentage (79%) of the GO terms have passed the NLP parser. They also show that 35% of the GO terms were found in a 1 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi 2 http://gateway.nlm.nih.gov/gw/Cmd
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 corpus collected from the MEDLINE database and 27% of the GO terms were found in the current edition of the Unified Medical Language System (UMLS). A recent research work of Raychaudri et al. employs a “maximum entropy” technique to categorize 21 GO terms using training and test documents extracted from PUBMED using handcrafted keyword queries. Their study reports that their models trained on PUBMED documents published prior to 2001 achieved an accuracy of 72.8% when tested on
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern