IR-part2

Query word in color on page of images on page of out

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: al vectors aNer length- normaliza*on. §༊  Long and short documents now have comparable weights Introduc)on to Informa)on Retrieval S ec. 6.3 cosine(query,document) Dot product Unit vectors q•d q d cos( q, d ) = = • = qd qd ∑ V qd i =1 i i V 2 i =1 i ∑ q ∑ V i =1 d 2 i qi is the tf-idf weight of term i in the query di is the tf-idf weight of term i in the document cos(q,d) is the cosine similarity of q and d … or, equivalently, the cosine of the angle between q and d. Introduc)on to Informa)on Retrieval Cosine for length- normalized vectors §༊  For length- normalized vectors, cosine similarity is simply the dot product (or scalar product): V cos(q, d ) = q • d = ∑ qi di i= 1 for q, d length- normalized. 47 Introduc)on to Informa)on Retrieval Cosine similarity illustrated 48 Introduc)on to Informa)on Retrieval Sec. 6.3 Cosine similarity amongst 3 documents How similar are the novels SaS: Sense and Sensibility PaP: Pride and Prejudice, and WH: Wuthering Heights? term SaS PaP WH affection 115 58 20 jealous 10 7...
View Full Document

Ask a homework question - tutors are online