This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: DRAMNERI: a free knowledge based tool to Named Entity Recognition Antonio Toral Grupo de investigaci´on en Procesamiento del Lenguaje Natural y Sistemas de Informaci´on Departamento de Lenguajes y Sistemas Inform´aticos University of Alicante, Spain [email protected] Abstract. In this paper we present DRAMNERI, a free software appli- cation which uses rules and gazetteers in order to perform Named Entity Recognition. This system is fully customizable to any specific domain and it is multilingual. It has succesfully been applied in a domain spe- cific Information Extraction system and in a Question Answering task. 1 Introduction Named Entity Recognition (NER) is nowadays an important task for the reso- lution of other problems of higher complexity, like Information Retrieval (IR), Information Extraction (IE) or Question Answering (QA), among others. In spite of this, NER was initially only used as a subtask of IE. This is the Natural Language Processing (NLP) task that consists in retrieving relevant information from non structured texts and producing as a result a structured set of data, usually refered as templates. Several subtasks are applied in order to achieve this goal. As we have already pointed out, one of these is NER. As defined in the Message Understading Conference , NER consists in identifying and categorizing entity names wich can include also temporal and/or numerical expressions. As in other NLP techniques, there are two approaches to NER . One is based in knowledge while the other uses a supervised learning algorithm. Regarding resources, the first usually uses gazetteers and rules whereas the later needs an annotated corpus. The knowledge based model obtains good results in specific domains, as the gazetteers can be adapted very precisely, and it is able to detect complex entities, as the rules can be tailored to meet nearly any requirement. However, if we deal with a non restricted domain, it is better to use the learning approach, as it would be very tedious and time consuming to build rules and gazetteers in this case. Because our aim is to classify complex entities in restricted texts, we have adopted the knowledge model. We also wanted our system to be highly flexible and adaptable. That is why we have made almost all possible parameters cus- tomizable (i.e. dictionaries to use, entity categories, length of contexts, etc). This way the system can be easily configured to work with different languages and domains. Moreover, this way it can deal with an open set of entity categories 1 . Regarding software licenses, we would like to point out that we strongly agree with Freeling  and Weka  developers that the free availability of basic NLP tools would speed up progress in our area of reasearch. Thus, we modestly contribute in this aspect by developing this software with a free license 2 ....
View Full Document
This note was uploaded on 09/21/2009 for the course CS 580 taught by Professor Fdfdf during the Spring '09 term at University of Toronto- Toronto.
- Spring '09