Unformatted text preview: al vectors aNer length- normaliza*on. §༊  Long and short documents now have comparable weights Introduc)on to Informa)on Retrieval S ec. 6.3 cosine(query,document) Dot product Unit vectors q•d q d cos( q, d ) = = • = qd qd ∑ V qd i =1 i i V 2 i =1 i ∑ q ∑ V i =1 d 2 i qi is the tf-idf weight of term i in the query di is the tf-idf weight of term i in the document cos(q,d) is the cosine similarity of q and d … or, equivalently, the cosine of the angle between q and d. Introduc)on to Informa)on Retrieval Cosine for length- normalized vectors §༊  For length- normalized vectors, cosine similarity is simply the dot product (or scalar product): V cos(q, d ) = q • d = ∑ qi di i= 1 for q, d length- normalized. 47 Introduc)on to Informa)on Retrieval Cosine similarity illustrated 48 Introduc)on to Informa)on Retrieval Sec. 6.3 Cosine similarity amongst 3 documents How similar are the novels SaS: Sense and Sensibility PaP: Pride and Prejudice, and WH: Wuthering Heights? term SaS PaP WH affection 115 58 20 jealous 10 7...
