jurafsky&martin_3rdEd_17 (1).pdf

Moreover given the number of polysemous words in

Info icon This preview shows pages 308–310. Sign up to view the full content.

quate training data for every word in the test set will be available. Moreover, given the number of polysemous words in reasonably sized lexicons, approaches based on training one classifier per term are unlikely to be practical. In the following sections we explore the application of various machine learning paradigms to word sense disambiguation. 17.5 Supervised Word Sense Disambiguation If we have data that has been hand-labeled with correct word senses, we can use a supervised learning approach to the problem of sense disambiguation—extracting features from the text and training a classifier to assign the correct sense given these features. The output of training is thus a classifier system capable of assigning sense labels to unlabeled words in context. For lexical sample tasks, there are various labeled corpora for individual words; these corpora consist of context sentences labeled with the correct sense for the tar- get word. These include the line-hard-serve corpus containing 4,000 sense-tagged examples of line as a noun, hard as an adjective and serve as a verb (Leacock et al., 1993) , and the interest corpus with 2,369 sense-tagged examples of interest as a noun (Bruce and Wiebe, 1994) . The SENSEVAL project has also produced a num- ber of such sense-labeled lexical sample corpora ( SENSEVAL -1 with 34 words from the HECTOR lexicon and corpus ( Kilgarriff and Rosenzweig 2000 , Atkins 1993 ), SENSEVAL -2 and -3 with 73 and 57 target words, respectively ( Palmer et al. 2001 , Kilgarriff 2001 ). For training all-word disambiguation tasks we use a semantic concordance , semantic concordance a corpus in which each open-class word in each sentence is labeled with its word sense from a specific dictionary or thesaurus. One commonly used corpus is Sem- Cor, a subset of the Brown Corpus consisting of over 234,000 words that were man-
Image of page 308

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

17.5 S UPERVISED W ORD S ENSE D ISAMBIGUATION 309 ually tagged with WordNet senses ( Miller et al. 1993 , Landes et al. 1998 ). In ad- dition, sense-tagged corpora have been built for the SENSEVAL all-word tasks. The SENSEVAL -3 English all-words test data consisted of 2081 tagged content word to- kens, from 5,000 total running words of English from the WSJ and Brown corpora (Palmer et al., 2001) . The first step in supervised training is to extract features that are predictive of word senses. The insight that underlies all modern algorithms for word sense disam- biguation was famously first articulated by Weaver (1955) in the context of machine translation: If one examines the words in a book, one at a time as through an opaque mask with a hole in it one word wide, then it is obviously impossible to determine, one at a time, the meaning of the words. [. . . ] But if one lengthens the slit in the opaque mask, until one can see not only the central word in question but also say N words on either side, then if N is large enough one can unambiguously decide the meaning of the central word. [. . . ] The practical question is : “What minimum value of N will, at least in a tolerable fraction of cases, lead to the correct choice of meaning for the central word?” We first perform some processing on the sentence containing the window, typi-
Image of page 309
Image of page 310
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern