jurafsky&martin_3rdEd_17 (1).pdf

In general we evaluate by using held out data from

Info icon This preview shows pages 311–313. Sign up to view the full content.

tems are permitted to pass on the labeling of some instances. In general, we evaluate by using held-out data from the same sense-tagged corpora that we used for training, such as the SemCor corpus discussed above or the various corpora produced by the SENSEVAL effort. Many aspects of sense evaluation have been standardized by the SENSEVAL and SEMEVAL efforts ( Palmer et al. 2006 , Kilgarriff and Palmer 2000 ). This framework provides a shared task with training and testing materials along with sense invento- ries for all-words and lexical sample tasks in a variety of languages. The normal baseline is to choose the most frequent sense for each word from the most frequent sense senses in a labeled corpus (Gale et al., 1992a) . For WordNet, this corresponds to the first sense, since senses in WordNet are generally ordered from most frequent to least frequent. WordNet sense frequencies come from the SemCor sense-tagged corpus described above– WordNet senses that don’t occur in SemCor are ordered arbitrarily after those that do. The most frequent sense baseline can be quite accurate, and is therefore often used as a default, to supply a word sense when a supervised algorithm has insufficient training data. 17.6 WSD: Dictionary and Thesaurus Methods Supervised algorithms based on sense-labeled corpora are the best-performing algo- rithms for sense disambiguation. However, such labeled training data is expensive and limited. One alternative is to get indirect supervision from dictionaries and the- sauruses or similar knowledge bases and so this method is also called knowledge- based WSD. Methods like this that do not use texts that have been hand-labeled with senses are also called weakly supervised. 17.6.1 The Lesk Algorithm The most well-studied dictionary-based algorithm for sense disambiguation is the Lesk algorithm , really a family of algorithms that choose the sense whose dictio- Lesk algorithm nary gloss or definition shares the most words with the target word’s neighborhood. Figure 17.6 shows the simplest version of the algorithm, often called the Simplified Lesk algorithm (Kilgarriff and Rosenzweig, 2000) . Simplified Lesk As an example of the Lesk algorithm at work, consider disambiguating the word bank in the following context: (17.19) The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. given the following two WordNet senses:
Image of page 311

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

312 C HAPTER 17 C OMPUTING WITH W ORD S ENSES function S IMPLIFIED L ESK ( word, sentence ) returns best sense of word best-sense most frequent sense for word max-overlap 0 context set of words in sentence for each sense in senses of word do signature set of words in the gloss and examples of sense overlap C OMPUTE O VERLAP ( signature , context ) if overlap > max-overlap then max-overlap overlap best-sense sense end return ( best-sense ) Figure 17.6 The Simplified Lesk algorithm. The C OMPUTE O VERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The original Lesk algorithm defines the context in a more complex way. The
Image of page 312
Image of page 313
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern