{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

10.1.1.99.589 - COMPUTER PROCESSING OF ORIENTAL LANGUAGES...

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
COMPUTER PROCESSING OF ORIENTAL LANGUAGES, VOL. 11, NO. 4, 1998 1 Named Entity Extraction for Information Retrieval 1 H SIN –H SI C HEN , Y UNG –W EI D ING , and S HIH –C HUNG T SAI Abstract : Name extraction is indispensable for both natural language understanding and information retrieval. However, proper names are major unknown words in natural language texts, and unknown word identification is still a challenge problem in natural language proces- sing. This paper deals with identification of person names, organization names and location names from Chinese texts. Different types of information from different levels of text are employed, including character conditions, statistic information, titles, punctuation marks, or- ganization and location keywords, speech–act and locative verbs, cache and n–gram model. We also clarify which strategies can be used in which cases, i.e., queries and/or documents. In our experiments, the recall rates and the precision rates for the extraction of person names, orga- nization names, and location names under MET data are (87.33%, 82.33%), (76.67%, 79.33%) and (77.00%, 82.00%), respectively. Keywords : Chinese language processing, Information retrieval, N–gram model, Named entity extraction, Word segmentation. 1. Introduction People, affairs, time, places and things are five basic entities in a document. When we catch the fundamental entities, we can understand a document to some degree. These entities are also the targets that users are interested in. That is, users often issue queries to retrieve such kinds of entities in information retrieval systems. Thompson and Dozier [1] reported an experiment over periods of several days in 1995. It showed 67.8%, 83.4%, and 38.8% of quer- ies to Wall Street Journal, Los Angeles Times, and Washington Post, respectively, involve name searching. Besides name searching, name identification has many applications. Chen and Wu [2] consider person names as one of cues in sentence alignment. Chen and Lee [3] show its application to anaphora resolution. Chen and Bian [4] propose a method to construct white pages for Internet/Intranet users automatically. They extract information from World Wide Web documents, including proper nouns, E–mail addresses and home page URLs, and find the relationship among these data. Name extraction is indispensable for both natural language understanding and informa- tion retrieval. However, proper names are major unknown words in natural language texts. Chen, He and Xu [5] examined TREC–5 Chinese collection and found that there were 287 university and college names, and 627 company names. Only 21 out of 287 names and 14 out of 627 are included in their dictionary. Unknown word identification is a challenge problem in natural language processing. Many papers [6–8] touch on this problem. In a famous mes- sage understanding system evaluation and message understanding conference (MUC), which is sponsored by Tipster Text Program of DARPA, named entity, which covers named orga- nizations, people, and locations, along with date/time expressions and monetary and percent- age expressions, is one of tasks for evaluating technologies. In MUC–6 named entity task,
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern