IR-part2

# 64 k idf weighng has many variants columns headed n

Unformatted text preview: ༊  Euclidean distance? §༊  Euclidean distance is a bad idea . . . §༊  . . . because Euclidean distance is large for vectors of diﬀerent lengths. Introduc)on to Informa)on Retrieval Why distance is a bad idea The Euclidean distance between q and d2 is large even though the distribu*on of terms in the query q and the distribu*on of terms in the document d2 are very similar. Sec. 6.3 Introduc)on to Informa)on Retrieval Sec. 6.3 Use angle instead of distance §༊  Thought experiment: take a document d and append it to itself. Call this document dʹȃ. §༊  “Seman*cally” d and dʹȃ have the same content §༊  The Euclidean distance between the two documents can be quite large §༊  The angle between the two documents is 0, corresponding to...
