Unformatted text preview: are either synonymous or not •  Similarity (or distance): a looser metric •  Two words are more similar if they share more features of meaning •  Similarity is properly a rela-on between senses •  The word “bank” is not similar to the word “slope” •  Bank1 is similar to fund3 •  Bank2 is similar to slope5 •  But we’ll compute similarity over both words and senses Dan Jurafsky Why word similarity •  •  •  •  •  •  •  •  Informa-on retrieval Ques-on answering Machine transla-on Natural language genera-on Language modeling Automa-c essay grading Plagiarism detec-on Document clustering Dan Jurafsky Word similarity and word relatedness •  We o^en dis-nguish word similarity from word relatedness •  Similar words: near ­synonyms •  Related words: can be related any way •  car, bicycle: similar •  car, gasoline: related, not similar Dan Jurafsky Two classes of similarity algorithms •  Thesaurus ­based algorithms •  Are words “nearby” in hypernym hierarchy? •  Do words have similar glosses (defini-ons)? •  Distribu-onal algorithms •  Do words have similar distribu-onal contexts? Dan Jurafsky Path based similarity •  Two concepts (senses/synsets) are similar if they are near each other in the thesaurus hierarchy •  =have a short path between them •  concepts have path 1 to themselves Dan Jurafsky Refinements to path ­based similarity •  pathlen(c1,c2) = 1 + number of edges in the shortest path in the hypernym graph between sense nodes c1 and c2 •  ranges from 0 to 1 (iden-ty) 1 •  simpath(c1,c2) = pathlen(c1, c2 ) •  wordsim(w1,w2) = max sim(c1,c2) c1∈senses(w1),c2∈senses(w2) Dan Jurafsky Example: path ­based similarity simpath(c1,c2) = 1/pathlen(c1,c2) simpath(nickel,coin) = 1/2 = .5 simpath(fund,budget) = 1/2 = .5 simpath(nickel,currency) = 1/4 = .2...
