simpath(nickel,money) = 1/6 = .17 simpath(coinage,Richter scale) = 1/6 = .17 Dan Jurafsky Problem with basic path ­based similarity •  Assumes each link represents a uniform distance •  But nickel to money seems to us to be closer than nickel to standard •  Nodes high in the hierarchy are very abstract •  We instead want a metric that •  Represents the cost of each edge independently •  Words connected only through abstract nodes •  are less similar Dan Jurafsky Informa1on content similarity metrics •  Let's deﬁne P(c) as: Resnik 1995. Using informa-on content to evaluate seman-c similarity in a taxonomy. IJCAI •  The probability that a randomly selected word in a corpus is an instance of concept c •  Formally: there is a dis-nct random variable, ranging over words, associated with each concept in the hierarchy •  for a given concept, each observed noun is either •  a member of that concept with probability P(c) •  not a member of that concept with probability 1-P(c) •  All words are members of the root node (En-ty) •  P(root)=1 •  The lower a node in hierarchy, the lower its probability Dan Jurafsky en-ty Informa1on content similarity … geological ­forma-on •  Train by coun-ng in a corpus natural eleva-on cave shore •  Each instance of hill counts toward frequency of natural eleva<on, geological forma<on, en<ty, etc hill ridge grono coast •  Let words(c) be the set of all words that are children of node c •  words("geo ­forma-on") = {hill,ridge,grono,coast,cave,shore,natural eleva-on} •  words("natural eleva-on") = {hill, ridge} " count (w ) P(c) = w!words ( c ) N Dan Jurafsky Informa1on content similarity •  WordNet hierarchy augmented with probabili-es P(c) D. Lin. 1998. An Informa-on ­Theore-c Deﬁni-on of Similarity. ICML 1998 Dan Jurafsky Informa1on content: deﬁni1ons •  Informa-on content: IC(c) = -log P(c) •  Most informa-ve subsumer (Lowest common subsumer) LCS(c1,c2) = The most informa-ve (lowest) node in the hierarchy subsum...
