{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

0010026v1 - Enriching very large ontologies using the WWW...

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
arXiv:cs.CL/0010026 17 Oct 2000 Enriching very large ontologies using the WWW Eneko Agirre 1 , Olatz Ansa 1 , Eduard Hovy 2 and David Martínez 1 Abstract. This paper explores the possibility to exploit text on the world wide web in order to enrich the concepts in existing ontologies. First, a method to retrieve documents from the WWW related to a concept is described. These document collections are used 1) to construct topic signatures (lists of topically related words) for each concept in WordNet, and 2) to build hierarchical clusters of the concepts (the word senses) that lexicalize a given word. The overall goal is to overcome two shortcomings of WordNet: the lack of topical links among concepts, and the proliferation of senses. Topic signatures are validated on a word sense disambiguation task with good results, which are improved when the hierarchical clusters are used. 1 INTRODUCTION Knowledge acquisition is a long-standing problem in both Artificial Intelligence and Computational Linguistics. Semantic and world knowledge acquisition pose a problem with no simple answer. Huge efforts and investments have been made to build repositories with such knowledge (which we shall call ontologies for simplicity) but with unclear results, e.g. CYC [1], EDR [2], WordNet [3]. WordNet, for instance, has been criticized for its lack of relations between topically related concepts, and the proliferation of word senses. As an alternative to entirely hand-made repositories, automatic or semi-automatic means have been proposed for the last 30 years. On the one hand, shallow techniques are used to enrich existing ontologies [4] or to induce hierarchies [5], usually analyzing large corpora of texts. On the other hand, deep natural language processing is called for to acquire knowledge from more specialized texts (dictionaries, encyclopedias or domain specific texts) [6][7]. These research lines are complementary; deep understanding would provide specific relations among concepts, whereas shallow techniques could provide generic knowledge about the concepts. This paper explores the possibility to exploit text on the world wide web in order to enrich WordNet. The first step consists on linking each concept in WordNet to relevant document collections in the web, which are further processed to overcome some of WordNet’s shortcomings. On the one hand, concepts are linked to topically related words. Topically related words form the topic signature for each concept in the hierarchy. As in [8][9] we define a topic signature as a family of related terms { t , <( w 1 ,s 1 )…( w i ,s i )…>}, where t is the topic (i.e. the target concept) and each w i is a word associated with 1 IxA NLP group. University of the Basque Country. 649 pk. 20.080 Donostia. Spain. Email: [email protected], [email protected]
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern