jurafsky&martin_3rdEd_17 (1).pdf

The dice measure was similarly extended from binary

Info icon This preview shows pages 281–284. Sign up to view the full content.

The Dice measure, was similarly extended from binary vectors to vectors of Dice weighted associations; one extension from Curran (2003) uses the Jaccard numerator but uses as the denominator normalization factor the total weighted value of non- zero entries in the two vectors. sim Dice ( ~ v , ~ w ) = 2 P N i = 1 min ( v i , w i ) P N i = 1 ( v i + w i ) (15.20) Finally, there is a family of information-theoretic distributional similarity mea- sures ( Pereira et al. 1993 , Dagan et al. 1994 , Dagan et al. 1999 , Lee 1999 ). The intuition of these models is that if two vectors, ~ v and ~ w , each express a probability distribution (their values sum to one), then they are are similar to the extent that these probability distributions are similar. The basis of comparing two probability distri- butions P and Q is the Kullback-Leibler divergence or KL divergence or relative KL divergence entropy (Kullback and Leibler, 1951) :
Image of page 281

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

282 C HAPTER 15 V ECTOR S EMANTICS PMI ( w , f ) = log 2 P ( w , f ) P ( w ) P ( f ) ( 15 . 4 ) t-test ( w , f ) = P ( w , f ) - P ( w ) P ( f ) p P ( f ) P ( w ) ( 15 . 13 ) cosine ( ~ v , ~ w ) = ~ v · ~ w | ~ v || ~ w | = P N i = 1 v i w i q P N i = 1 v 2 i q P N i = 1 w 2 i ( 15 . 17 ) Jaccard ( ~ v , ~ w ) = P N i = 1 min ( v i , w i ) P N i = 1 max ( v i , w i ) ( 15 . 19 ) Dice ( ~ v , ~ w ) = 2 P N i = 1 min ( v i , w i ) P N i = 1 ( v i + w i ) ( 15 . 20 ) JS ( ~ v || ~ w ) = D ( ~ v | ~ v + ~ w 2 )+ D ( ~ w | ~ v + ~ w 2 ) ( 15 . 23 ) Figure 15.12 Defining word similarity: measures of association between a target word w and a feature f = ( r , w 0 ) to another word w 0 , and measures of vector similarity between word co-occurrence vectors ~ v and ~ w . D ( P || Q ) = X x P ( x ) log P ( x ) Q ( x ) (15.21) Unfortunately, the KL-divergence is undefined when Q ( x ) = 0 and P ( x ) 6 = 0, which is a problem since these word-distribution vectors are generally quite sparse. One alternative (Lee, 1999) is to use the Jensen-Shannon divergence , which repre- Jensen- Shannon divergence sents the divergence of each distribution from the mean of the two and doesn’t have this problem with zeros. JS ( P || Q ) = D ( P | P + Q 2 )+ D ( Q | P + Q 2 ) (15.22) Rephrased in terms of vectors ~ v and ~ w , sim JS ( ~ v || ~ w ) = D ( ~ v | ~ v + ~ w 2 )+ D ( ~ w | ~ v + ~ w 2 ) (15.23) Figure 15.12 summarizes the measures of association and of vector similarity that we have designed. See the Historical Notes section for a summary of other vector similarity measures. 15.4 Using syntax to define a word’s context Instead of defining a word’s context by nearby words, we could instead define it by the syntactic relations of these neighboring words. This intuition was first suggested by Harris (1968) , who pointed out the relation between meaning and syntactic com- binatory possibilities: The meaning of entities, and the meaning of grammatical relations among them, is related to the restriction of combinations of these entities rela- tive to other entities.
Image of page 282
15.5 E VALUATING V ECTOR M ODELS 283 Consider the words duty and responsibility . The similarity between the mean- ings of these words is mirrored in their syntactic behavior. Both can be modified by adjectives like additional, administrative, assumed, collective, congressional, con- stitutional , and both can be the direct objects of verbs like assert, assign, assume, attend to, avoid, become, breach (Lin and Pantel, 2001) .
Image of page 283

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 284
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern