4 4 1 4 4 2 1 4 6 7 4 5 61

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: milarity (I) Dan Jurafsky Problems with thesaurus ­based meaning •  We don’t have a thesaurus for every language •  Even if we do, they have problems with recall •  Many words are missing •  Most (if not all) phrases are missing •  Some connec-ons between senses are missing •  Thesauri work less well for verbs, adjec-ves •  Adjec-ves and verbs have less structured hyponymy rela-ons Dan Jurafsky Distribu1onal models of meaning •  Also called vector ­space models of meaning •  Offer much higher recall than hand ­built thesauri •  Although they tend to have lower precision •  Zellig Harris (1954): “oculist and eye ­doctor … occur in almost the same environments…. If A and B have almost iden1cal environments we say that they are synonyms. •  Firth (1957): “You shall know a word by the 53 company it keeps!” Dan Jurafsky Intui1on of distribu1onal word similarity •  Nida example: A bottle of tesgüino is on the table! Everybody likes tesgüino! Tesgüino makes you drunk! We make tesgüino out of corn.! •  From context words humans can guess tesgüino means •  an alcoholic beverage like beer •  Intui-on for algorithm: •  Two words are similar if they have similar word contexts. Dan Jurafsky Reminder: Term ­document matrix •  Each cell: count of term t in a document d: zt,d: •  Each document is a count vector in ℕv: a column below !"#$%&#'()*#+, <6,,/* "%/@(*7 0%%/ E/%.9 55 = A BD C -.*/0,1#2(31, 4&/(&"#56*"67 = A ?> ==D > =A = F 8*97:#; =? BC ? F Dan Jurafsky Reminder: Term ­document matrix •  Two documents are similar if their vectors are similar !"#$%&#'()*#+, <6,,/* "%/@(*7 0%%/ E/%.9 5...
View Full Document

This document was uploaded on 02/14/2014.

Ask a homework question - tutors are online