124.11.lec11 - CS 124/LINGUIST 180: From Languages to...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS 124/LINGUIST 180: From Languages to Information Dan Jurafsky Lecture 11: Computa6onal Lexical Seman6cs Outline: Computational Lexical Semantics   Intro to Lexical Seman6cs   Homonymy, Polysemy, Synonymy   Thesaurus: WordNet   Computa6onal Lexical Seman6cs   Word Similarity   Thesaurus ­based   Distribu6onal Three Perspectives on Meaning 1.  Lexical Seman.cs •  The meanings of individual words 2.  Formal Seman.cs (or Composi6onal Seman6cs or Senten6al Seman6cs) •  How those meanings combine to make meanings for individual sentences or uKerances 3.  Discourse or Pragma.cs   How those meanings combine with each other and with other facts about various kinds of context to make meanings for a text or discourse –  Dialog or Conversa.on is oLen lumped together with Discourse Preliminaries   What’s a word?   Defini6ons we’ve used over the quarter: Types, tokens, stems, roots, etc...   Lexeme: An entry in a lexicon consis6ng of a pairing of a form with a single meaning representa6on   Lexicon: A collec6on of lexemes Relationships between word meanings   Homonymy   Polysemy   Synonymy   Antonymy   Hypernomy   Hyponomy   Meronomy First idea: The unit of meaning is called a Sense or wordsense   One word “bank” can have mul6ple different meanings:   “Instead, a bank can hold the investments in a custodial account in the client’s name”   “But as agriculture burgeons on the east bank, the river will shrink even more”   We say that a sense is a representa6on of one aspect of the meaning of a word.   Thus bank here has two senses   Bank1:   Bank2: Some more terminology   Lemmas and wordforms   A lexeme is an abstract pairing of meaning and form   A lemma or cita.on form is the gramma6cal form that is used to represent a lexeme.   Carpet is the lemma for carpets   Dormir is the lemma for duermes.   Specific surface forms carpets, sung, duermes are called wordforms   The lemma bank has two senses:   Instead, a bank can hold the investments in a custodial account in the client’s name   But as agriculture burgeons on the east bank, the river will shrink even more.   A sense is a discrete representa6on of one aspect of the meaning of a word Homonymy   Homonymy:   Lexemes that share a form   Phonological, orthographic or both   But have unrelated, dis6nct meanings   Clear example:   Bat (wooden s6ck ­like thing) vs   Bat (flying scary mammal thing)   Or bank (financial ins6tu6on) versus bank (riverside)   Can be homophones, homographs, or both:   Homophones:   Write and right   Piece and peace Homonymy causes problems for NLP applications   Text ­to ­Speech   Same orthographic form but different phonological form   bass vs bass   Informa6on retrieval   Different meanings same orthographic form   QUERY: bat care   Machine Transla6on   Speech recogni6on   Why? Polysemy   1. The bank was constructed in 1875 out of local red brick.   2. I withdrew the money from the bank   Are those the same sense?   We might call sense 2:   “A financial ins6tu6on”   And sense 1   “The building belonging to a financial ins6tu6on”   Or consider the following example   While some banks furnish sperm only to married women, others are less restric6ve   Which sense of bank is this? Polysemy   We call polysemy the situa6on when a single word has mul6ple related meanings (bank the building, bank the financial ins6tu6on, bank the biological repository)   Most non ­rare words have mul6ple meanings Polysemy: A systematic relationship between senses   Lots of types of polysemy are systema6c   School, university, hospital   Can all be used to mean the ins6tu6on or the building.   We might say there is a rela6onship:   Building < ­> Organiza6on   Other such kinds of systema6c polysemy: How do we know when a word has more than one sense?   Consider examples of the word “serve”:   Which flights serve breakfast?   Does America West serve Philadelphia?   The “zeugma” test:   ?Does United serve breakfast and San Jose?   Since this sounds weird, we say that these are two different senses of “serve” Synonyms   Word that have the same meaning in some or all contexts.   filbert / hazelnut   couch / sofa   big / large   automobile / car   vomit / throw up   Water / H20   Two lexemes are synonyms if they can be successfully subs6tuted for each other in all situa6ons   If so they have the same proposi.onal meaning Synonyms   But there are few (or no) examples of perfect synonymy.   Why should that be?   Even if many aspects of meaning are iden6cal   S6ll may not preserve the acceptability based on no6ons of politeness, slang, register, genre, etc.   Example:   Water and H20   Big/large   Brave/courageous Synonymy is a relation between senses rather than words   Consider the words big and large   Are they synonyms?   How big is that plane?   Would I be flying on a large or small plane?   How about here:   Miss Nelson, for instance, became a kind of big sister to Benjamin.   ?Miss Nelson, for instance, became a kind of large sister to Benjamin.   Why?   big has a sense that means being older, or grown up   large lacks this sense Antonyms   Senses that are opposites with respect to one feature of their meaning   Otherwise, they are very similar!   dark / light   short / long   hot / cold   up / down   in / out   More formally: antonyms can   define a binary opposi6on or at opposite ends of a scale (long/short, fast/slow)   Be reversives: rise/fall, up/down Hyponymy   One sense is a hyponym of another if the first sense is more specific, deno6ng a subclass of the other   car is a hyponym of vehicle   dog is a hyponym of animal   mango is a hyponym of fruit   Conversely   vehicle is a hypernym/superordinate of car   animal is a hypernym of dog   fruit is a hypernym of mango superordinate vehicle fruit furniture mammal hyponym car mango chair dog Hypernymy more formally   Extensional:   The class denoted by the superordinate   extensionally includes the class denoted by the hyponym   Entailment:   A sense A is a hyponym of sense B if being an A entails being a B   Hyponymy is usually transi6ve   (A hypo B and B hypo C entails A hypo C) II. WordNet   A hierarchically organized lexical database   On ­line thesaurus + aspects of a dic6onary   Versions for other languages are under development Category Noun Unique Forms 117,097 Verb 11,488 Adjective 22,141 Adverb 4,601 WordNet   Where it is:   hKp://www.cogsci.princeton.edu/cgi ­bin/webwn Applications of Ontologies   Informa6on Extrac6on   Bioinforma6cs   Medical Informa6cs   Informa6on Retrieval   Ques6on Answering   Machine Transla6on   Digital Libraries   Business Process Modeling   User Interfaces Format of Wordnet Entries WordNet Noun Relations WordNet Verb Relations WordNet Hierarchies How is “sense” defined in WordNet?   The set of near ­synonyms for a WordNet sense is called a synset (synonym set); it’s their version of a sense or a concept   Example: chump as a noun to mean   ‘a person who is gullible and easy to take advantage of’   Each of these senses share this same gloss   Thus for WordNet, the meaning of this sense of chump is this list. Thesauri – Examples : MeSH   MeSH (Medical Subject Headings)   organized by terms (~ 250,000) that correspond to medical subjects   for each term syntac6c, morphological or seman6c variants are given MeSH Heading Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term See Also Databases, Genetic Genetic Databases Genetic Sequence Databases OMIM Online Mendelian Inheritance in Man Genetic Data Banks Genetic Data Bases Genetic Databanks Genetic Information Databases Genetic Screening Slide from Paul Buitelaar MeSH (Medical Subject Headings) Thesaurus MeSH Descriptor Definition Synonym set Slide from Illhoi Yoo, Xiaohua (Tony) Hu,and Il-Yeol Song 29 MeSH Tree  MeSH Ontology  Hierarchically arranged from most general to most specific.  Actually a graph rather than a tree MeSH Tree   normally appear in more than one place in the tree Slide from Illhoi Yoo, Xiaohua (Tony) Hu,and Il-Yeol Song MeSH Ontology  MeSH Ontology  Solving tradi6onal synonym/hypernym/ hyponym problems in informa6on retrieval and text mining.   Synonym problems <= Entry terms   E.g., Cancer and tumor are synonyms   Hypernym/hyponym problems <= MeSH Tree   E.g., Melatonin is a hormone Slide from Illhoi Yoo, Xiaohua (Tony) Hu,and Il-Yeol Song MeSH Ontology for MEDLINE indexing   In addi6on to its ontology role   MeSH Descriptors have been used to index MEDLINE ar6cles.   MEDLINE is NLM's bibliographic database   Over 18 million ar6cles   Refs to journal ar6cles in the life sciences with a concentra6on on biomedicine   About 10 to 20 MeSH terms are manually assigned to each ar6cle (aLer reading full papers) by trained curators.   3 to 5 MeSH terms are “MajorTopics” that primarily represent an ar6cle. Slide from Illhoi Yoo, Xiaohua (Tony) Hu, and Il-Yeol Song 32 Word Similarity   Synonymy is a binary rela6on   Two words are either synonymous or not   We want a looser metric   Word similarity or   Word distance   Two words are more similar   If they share more features of meaning   Actually these are really rela6ons between senses:   Instead of saying “bank is like fund”   We say   Bank1 is similar to fund3   Bank2 is similar to slope5   We’ll compute them over both words and senses Why word similarity   Informa6on retrieval   Ques6on answering   Machine transla6on   Natural language genera6on   Language modeling   Automa6c essay grading   Plagiarism detec6on   Document clustering Two classes of algorithms   Thesaurus ­based algorithms   Based on whether words are “nearby” in Wordnet or MeSH   Distribu6onal algorithms   By comparing words based on their distribu6onal context Thesaurus-based word similarity   We could use anything in the thesaurus   Meronymy   Glosses   Example sentences   In prac6ce   By “thesaurus ­based” we just mean   Using the is ­a/subsump6on/hypernym hierarchy   Word similarity versus word relatedness   Similar words are near ­synonyms   Related could be related any way   Car, gasoline: related, not similar   Car, bicycle: similar Path based similarity   Two words are similar if nearby in thesaurus hierarchy (i.e. short path between them) Refinements to path-based similarity   pathlen(c1,c2) = number of edges in the shortest path in the thesaurus graph between the sense nodes c1 and c2   simpath(c1,c2) =  ­log pathlen(c1,c2)   wordsim(w1,w2) =   maxc1∈senses(w1),c2∈senses(w2) sim(c1,c2) Problem with basic path-based similarity   Assumes each link represents a uniform distance   Nickel to money seem closer than nickel to standard   Instead:   Want a metric which lets us   Represent the cost of each edge independently Information content similarity metrics   Let’s define P(C) as:   The probability that a randomly selected word in a corpus is an instance of concept c   Formally: there is a dis6nct random variable, ranging over words, associated with each concept in the hierarchy   P(root)=1   The lower a node in the hierarchy, the lower its probability Information content similarity   Train by coun6ng in a corpus   1 instance of “dime” could count toward frequency of coin, currency, standard, etc   More formally: P (c ) = € ∑ count (w) w ∈ words( c ) N Information content similarity   WordNet hieararchy augmented with probabili6es P(C) Information content: definitions   Informa6on content:   IC(c)= ­logP(c)   Lowest common subsumer   LCS(c1,c2) = the lowest common subsumer   I.e. the lowest node in the hierarchy   That subsumes (is a hypernym of) both c1 and c2   We are now ready to see how to use informa6on content IC as a similarity metric Resnik method   The similarity between two words is related to their common informa6on   The more two words have in common, the more similar they are   Resnik: measure the common informa6on as:   The info content of the lowest common subsumer of the two nodes   simresnik(c1,c2) =  ­log P(LCS(c1,c2)) Dekang Lin method   Similarity between A and B needs to do more than measure common informa6on   The more differences between A and B, the less similar they are:   Commonality: the more info A and B have in common, the more similar they are   Difference: the more differences between the info in A and B, the less similar   Commonality: IC(Common(A,B))   Difference: IC(descrip6on(A,B) ­IC(common(A,B)) Dekang Lin method   Similarity theorem: The similarity between A and B is measured by the ra6o between the amount of informa6on needed to state the commonality of A and B and the informa6on needed to fully describe what A and B are   simLin(A,B)= log P(common(A,B)) _______________ log P(descrip6on(A,B)) "   Lin furthermore shows (modifying Resnik) that info in common is twice the info content of the LCS Lin similarity function   SimLin(c1,c2) = 2 x log P (LCS(c1,c2)) ________________ log P(c1) + log P(c2)   SimLin(hill,coast) = 2 x log P (geological ­forma6on)) ________________ log P(hill) + log P(coast)   = .59 The (extended) Lesk Algorithm   Two concepts are similar if their glosses contain similar words   Drawing paper: paper that is specially prepared for use in draLing   Decal: the art of transferring designs from specially prepared paper to a wood or glass or metal surface   For each n ­word phrase that occurs in both glosses   Add a score of n2   Paper and specially prepared for 1 + 4 = 5… Summary: thesaurus-based similarity Evaluating thesaurus-based similarity   Intrinsic Evalua6on:   Correla6on coefficient between   algorithm scores   word similarity ra6ngs from humans   Extrinsic (task ­based, end ­to ­end) Evalua6on:   Embed in some end applica6on   Malapropism (spelling error) detec6on   Essay grading   Plagiarism Detec6on   Language modeling in some applica6on Problems with thesaurus-based methods   We don’t have a thesaurus for every language   Even if we do, many words are missing   They rely on hyponym info:   Strong for nouns, but lacking for adjec6ves and even verbs   Alterna6ve   Distribu6onal methods for word similarity Distributional methods for word similarity   Firth (1957): “You shall know a word by the company it keeps!”   Zellig Harris (1954): “If we consider oculist and eye ­doctor we find that, as our corpus of uKerances grows, these two occur in almost the same environments. In contrast, there are many sentence environments in which oculist occurs but lawyer does not...   It is a ques6on of the rela6ve frequency of such environments, and of what we will obtain if we ask an informant to subs6tute any word he wishes for oculist (not asking what words have the same meaning).   These and similar tests all measure the probability of par6cular environments occurring with par6cular elements... If A and B have almost iden.cal environments we say that they are synonyms. Distributional methods for word similarity   Nida example:   A boKle of tezgüino is on the table   Everybody likes tezgüino   Tezgüino makes you drunk   We make tezgüino out of corn.   Intui6on:   just from these contexts a human could guess meaning of tezguino   So we should look at the surrounding contexts, see what other words have similar context. Context vector   Consider a target word w   Suppose we had one binary feature fi for each of the N words in the lexicon vi   Which means “word vi occurs in the neighborhood of w”   w=(f1,f2,f3,…,fN)   If w=tezguino, v1 = boKle, v2 = drunk, v3 = matrix:   w = (1,1,0,…) Intuition   Define two words by these sparse features vectors   Apply a vector distance metric   Say that two words are similar if two vectors are similar Distributional similarity   So we just need to specify 3 things 1.  How the co ­occurrence terms are defined 2.  How terms are weighted   (frequency? Logs? Mutual informa6on?) 3.  What vector distance metric should we use?   Cosine? Euclidean distance? Defining co-occurrence vectors   We could have windows   Bag ­of ­words   We generally remove stopwords   But the vectors are s6ll very sparse   So instead of using ALL the words in the neighborhood   How about just the words occurring in par6cular rela6ons Defining co-occurrence vectors   Zellig Harris (1968)   The meaning of en66es, and the meaning of gramma6cal rela6ons among them, is related to the restric6on of combina6ons of these en66es rela6ve to other en66es   Idea: two words are similar if they have similar parse contexts. Consider duty and responsibility:   They share a similar set of parse contexts: Slide adapted from Chris Calllison-Burch Co-occurrence vectors based on dependencies   For the word “cell”: vector of NxR features   R is the number of dependency rela6ons 2. Weighting the counts (“Measures of association with context”)   We have been using the frequency of some feature as its weight or value.   But we could use any func6on of this frequency   One possibility: R ­idf   Another one: condi.onal probability   f=(r,w’) = (obj ­of,a=ack)   P(f|w)=count(f,w)/count(w);   Assocprob(w,f)=p(f|w) Intuition: why not frequency   “drink it” is more common than “drink wine”   But “wine” is a beKer “drinkable” thing than “it”   Idea:   We need to control for change (expected frequency)   We do this by normalizing by the expected frequency we would get assuming independence Weighting: Mutual Information   Mutual informa.on: between 2 random variables X and Y   Pointwise mutual informa.on: measure of how oLen two events x and y occur, compared with what we would expect if they were independent: Weighting: Mutual Information   Pointwise mutual informa.on: measure of how oLen two events x and y occur, compared with what we would expect if they were independent:   PMI between a target word w and a feature f : Mutual information intuition   Objects of the verb drink 3. Defining similarity between vectors Summary of similarity measures Evaluating similarity   Intrinsic Evalua6on:   Correla6on coefficient   Between algorithm scores   And word similarity ra6ngs from humans   Extrinsic (task ­based, end ­to ­end) Evalua6on:   Malapropism (spelling error) detec6on   WSD   Essay grading   Taking TOEFL mul6ple ­choice vocabulary tests An example of detected plagiarism Paraphrase detection   how to compute synonyms for longer phrases?   One cute solu6on: Transla6ons   a slide from chris callison-burch   a slide from chris callison-burch   a slide from chris callison-burch Outline: Comp Lexical Semantics   Intro to Lexical Seman6cs   Homonymy, Polysemy, Synonymy   Thesaurus: WordNet   Computa6onal Lexical Seman6cs   Word Similarity   Thesaurus ­based   Distribu6onal ...
View Full Document

This document was uploaded on 06/01/2011.

Ask a homework question - tutors are online