1 1 0 5366 8 7377

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 6 = A BD C -.*/0,1#2(31, 4&/(&"#56*"67 = A ?> ==D > =A = F 8*97:#; =? BC ? F Dan Jurafsky The words in a term ­document matrix •  Each word is a count vector in ℕD: a row below !"#$%&#'()*#+, <6,,/* "%/@(*7 0%%/ E/%.9 57 = A BD C -.*/0,1#2(31, 4&/(&"#56*"67 = A ?> ==D > =A = F 8*97:#; =? BC ? F Dan Jurafsky The words in a term ­document matrix •  Two words are similar if their vectors are similar !"#$%&#'()*#+, <6,,/* "%/@(*7 0%%/ E/%.9 58 = A BD C -.*/0,1#2(31, 4&/(&"#56*"67 = A ?> ==D > =A = F 8*97:#; =? BC ? F Dan Jurafsky The Term ­Context matrix •  Instead of using en-re documents, use smaller contexts •  Paragraph •  Window of 10 words •  A word is now defined by a vector over counts of context words 59 Dan Jurafsky Sample contexts: 20 words (Brown corpus) •  equal amount of sugar, a sliced lemon, a tablespoonful of apricot preserve or jam, a pinch each of clove and nutmeg, •  on board for their enjoyment. Cau-ously she sampled her first pineapple and another fruit whose taste she likened to that of •  of a recursive type well suited to programming on the digital computer. In finding the op-mal R ­stage policy from that of •  substan-ally affect commerce, for the purpose of gathering data and informa1on necessary for the 60 study authorized in the first sec-on of this Dan Jurafsky Term ­context matrix for word similarity •  Two words are similar in meaning if their context vectors are similar !!"#$!"% &'()*+," !)"-&'+ 4 4 )-.,!))1, 4 4 #-2-+!1 4 6 -.7'"(!+-'. 4 5 61 #!+! 4 4 5 8 )-.&/ ",0*1+ 5 4 5 4 4 5 4 9 0*2!" 5 5 4 4 3 Dan Jurafsky Should we use raw counts? •  For the term ­document matrix •  We used z ­idf instead of raw term counts •  For the term ­context matrix • ...
View Full Document

This document was uploaded on 02/14/2014.

Ask a homework question - tutors are online