IR-part2

64 k idf weighng has many variants columns headed n

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ༊  Euclidean distance? §༊  Euclidean distance is a bad idea . . . §༊  . . . because Euclidean distance is large for vectors of different lengths. Introduc)on to Informa)on Retrieval Why distance is a bad idea The Euclidean distance between q and d2 is large even though the distribu*on of terms in the query q and the distribu*on of terms in the document d2 are very similar. Sec. 6.3 Introduc)on to Informa)on Retrieval Sec. 6.3 Use angle instead of distance §༊  Thought experiment: take a document d and append it to itself. Call this document dʹȃ. §༊  “Seman*cally” d and dʹȃ have the same content §༊  The Euclidean distance between the two documents can be quite large §༊  The angle between the two documents is 0, corresponding to...
View Full Document

This document was uploaded on 02/14/2014.

Ask a homework question - tutors are online