jurafsky&martin_3rdEd_17 (1).pdf

Since function words like the of etc occur in many

Info icon This preview shows pages 313–315. Sign up to view the full content.

Since function words like the , of , etc., occur in many documents, their IDF is very low, while the IDF of content words is high. Corpus Lesk thus uses IDF instead of a stop list. Formally, the IDF for a word i can be defined as idf i = log Ndoc nd i (17.20) where Ndoc is the total number of “documents” (glosses and examples) and nd i is the number of these documents containing word i . Finally, we can combine the Lesk and supervised approaches by adding new Lesk-like bag-of-words features. For example, the glosses and example sentences for the target sense in WordNet could be used to compute the supervised bag-of- words features in addition to the words in the SemCor context sentence for the sense (Yuret, 2004) . 17.6.2 Graph-based Methods Another way to use a thesaurus like WordNet is to make use of the fact that WordNet can be construed as a graph, with senses as nodes and relations between senses as edges. In addition to the hypernymy and other relations, it’s possible to create links between senses and those words in the gloss that are unambiguous (have only one sense). Often the relations are treated as undirected edges, creating a large undirected WordNet graph. Fig. 17.7 shows a portion of the graph around the word drink 1 v . toast n 4 drink v 1 drinker n 1 drinking n 1 potation n 1 sip n 1 sip v 1 beverage n 1 milk n 1 liquid n 1 food n 1 drink n 1 helping n 1 sup v 1 consumption n 1 consumer n 1 consume v 1 Figure 17.7 Part of the WordNet graph around drink 1 v , after Navigli and Lapata (2010) . There are various ways to use the graph for disambiguation, some using the whole graph, some using only a subpart. For example the target word and the words in its sentential context sentence can all be inserted as nodes in the graph via a directed edge to each of its senses. If we consider the sentence She drank some milk ,
Image of page 313

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

314 C HAPTER 17 C OMPUTING WITH W ORD S ENSES Fig. 17.8 shows a portion of the WordNet graph between the senses for between drink 1 v and milk 1 n . drink v 1 drinker n 1 beverage n 1 boozing n 1 food n 1 drink n 1 milk n 1 milk n 2 milk n 3 milk n 4 drink v 2 drink v 3 drink v 4 drink v 5 nutriment n 1 “drink” “milk” Figure 17.8 Part of the WordNet graph between drink 1 v and milk 1 n , for disambiguating a sentence like She drank some milk , adapted from Navigli and Lapata (2010) . The correct sense is then the one which is the most important or central in some way in this graph. There are many different methods for deciding centrality. The simplest is degree , the number of edges into the node, which tends to correlate degree with the most frequent sense. Another algorithm for assigning probabilities across nodes is personalized page rank , a version of the well-known pagerank algorithm personalized page rank which uses some seed nodes. By inserting a uniform probability across the word nodes ( drink and milk in the example) and computing the personalized page rank of the graph, the result will be a pagerank value for each node in the graph, and the sense with the maximum pagerank can then be chosen. See
Image of page 314
Image of page 315
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern