10.1.1.16.2038 - Bootstrapping an Ontology-based...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Bootstrapping an Ontology-based Information Extraction System Alexander Maedche ,G¨unter Neumann , Steffen Staab DFKI German Research Center for Artificial Intelligence, Saarbruecken, Germany neumann@dfki.de , http://www.dfki.de FZI Research Center at the University of Karlsruhe, Karlsruhe, D-76131 Karlsruhe, Germany maedche@fzi.de , http://www.fzi.de/wim AIFB, Univ. Karlsruhe, D-76128 Karlsruhe, Germany staab@aifb.uni-karlsruhe.de , http://www.aifb.uni-karlsruhe.de/WBS Abstract. Automatic intelligent web exploration will benefit from shallow information extraction techniques if the latter can be brought to work within many different do- mains. The major bottleneck for this, however, lies in the so far difficult and expensive modeling of lexical knowledge, extraction rules, and an ontology that together define the information extraction system. In this paper we present a bootstrapping approach that allows for the fast creation of an ontology-based information extracting system relying on several basic components, viz. a core information extraction system, an on- tology engineering environment and an inference engine. We make extensive use of machine learning techniques to support the semi-automatic, incremental bootstrapping of the domain-specific target information extraction system. Keywords: Ontologies, Information Extraction, Machine Learning 1 Introduction In order to overcome the problem of finding or extracting relevant information out of the enormous amount of text data electronically available, various technologies for in- formation management systems have been explored within the Natural Language Pro- cessing (NLP) and AI community. One line of such research is the investigation and development of intelligent information extraction systems. Information extraction (IE) is the task of identifying, collecting and normalizing relevant information from NL text and skipping irrelevant text passages. IE systems do not attempt an exhaustive deep NL analysis of all aspects of a text. Rather, they are built in order to analyse or “understand” only those text passages that contain information relevant for the task at hand. Thus, the IE system may be sufficiently fast and robust when dealing with free texts, such as appear on the Web.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The definition of relevancy is given implicitly by the IE model that specifies domain- specific lexical knowledge, extraction rules, and an ontology. The IE model allows to perform the required mappings from NL utterances to corresponding domain knowl- edge. In order to render possible an exhaustive and highly accurate extraction task, the model must be very detailed and comprehensive. Typically, the resulting mappings turn free text into target knowledge structures about crucial information — answering ques- tions about who , what , whom , when , where or why .
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 09/21/2009 for the course CS 580 taught by Professor Fdfdf during the Spring '09 term at University of Toronto- Toronto.

Page1 / 15

10.1.1.16.2038 - Bootstrapping an Ontology-based...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online