{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

IR-part2

# 64 k idf weighng has many variants columns headed n

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ༊  Euclidean distance? §༊  Euclidean distance is a bad idea . . . §༊  . . . because Euclidean distance is large for vectors of diﬀerent lengths. Introduc)on to Informa)on Retrieval Why distance is a bad idea The Euclidean distance between q and d2 is large even though the distribu*on of terms in the query q and the distribu*on of terms in the document d2 are very similar. Sec. 6.3 Introduc)on to Informa)on Retrieval Sec. 6.3 Use angle instead of distance §༊  Thought experiment: take a document d and append it to itself. Call this document dʹȃ. §༊  “Seman*cally” d and dʹȃ have the same content §༊  The Euclidean distance between the two documents can be quite large §༊  The angle between the two documents is 0, corresponding to...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online